Dynamic compiler and method of compiling code to generate dominant path and to handle exceptions

ABSTRACT

A dynamic compiler and method of compiling code to generate a dominate path and handle exceptions. The dynamic compiler includes an execution history recorder that is configured to record the number of times a fragment of code is interpreted. When the code is interpreted a threshold number of times, the code is queued for compilation. The execution history recorder also keeps track of where transfer of control came from and where transfer of control goes to for each fragment of code that is executed, thereby allowing for compilation of a dominant path of code. If the execution of code deviates from the dominant path of compiled code (such as when an exception occurs), a fallback interpreter is utilized to interpret the fragment of code to be executed.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This is a continuation of International ApplicationPCT/GB99/00788, filed on Mar. 16, 1999, which claims priority to U.K.Patent Application GB9825102.8, filed on Nov. 16, 1998, now abandoned.

[0002] Computer System, Computer-Readable Storage Medium and Method ofOperating Same, and Method of Operating that System

[0003] This invention relates, in its most general aspects, to acomputer system and to a method of operating that system, and toimprovements in the performance of various operations within such asystem. It also relates to a computer-readable storage medium. Thecomputer system may be, may include, or may be part of, a virtualmachine. The computer-readable storage medium may contain executablecode or other instructions for programming the computer system/virtualmachine.

[0004] In recent years, there have been developments in programminglanguages towards what is known as an object-oriented language. In thesedevelopments, concepts are regarded as ‘objects’, each carrying with ita set of data, or attributes, pertinent to that object, as well asinformation relating to so-called ‘methods’, that is functions orsub-routines, that can be performed on that object and its data. This iswell known to those skilled in the art of computing and/or programming.

[0005] The advent and rapid advancement in the spread and availabilityof computers has led to the independent development of different typesof systems, such as the IBM and IBM-compatible PC running IBM-DOS orMS-DOS or MS-Windows applications, the Apple Macintosh machines runningtheir own Apple System operating system, or various Unix machinesrunning their own Unix operating systems. This proliferation ofindependent systems has led to useful applications being available onlyin one format and not being capable of running on a machine for whichthe application was not designed.

[0006] Under such circumstances, programmers have devised software which‘emulates’ the host computer's operating system so that a ‘foreign’application can be made to run successfully in such a way that, as faras the user is concerned, the emulation is invisible. In other words,the user can perform all of the normal functions of say a Windows-basedapplication on a Unix machine using a Unix-based operating systemwithout noticing that he is doing so.

[0007] A particularly notable product of this type is that developed byInsignia Solutions of High Wycombe, GB and Santa Clara, Calif., USA andknown under the name ‘SoftWindows 2.0 for Powermac’. This softwareenables a physical Macintosh computer to emulate a PC having an Intel80486DX processor and 80487 maths co-processor plus memory, two harddisks, IBM-style keyboard, colour display and other features normallyfound on recent versions of the PC-type of computer.

[0008] Furthermore, there is an ever-increasing demand by the consumerfor electronics gadgetry, communications and control systems which, likecomputers, have developed independently of one another and have led toincompatibility between operating systems and protocols. For example,remote-control devices for video players, tape players and CD playershave similar functions, analogous to ‘play,’ ‘forward,’ ‘reverse,’‘pause,’ etc, but the codes for transmission between the remote control,or commander, operated by the user may not be compatible either betweendifferent types of equipment made by the same manufacturer or betweenthe same types of equipment made by different manufacturers. There wouldbe clear benefits of having software within the equipment which canproduce for example the correct ‘play’ code based upon a ‘play’ commandregardless of the specific hardware used in the equipment. Such softwareis commonly known as a ‘Virtual Machine.’

[0009] Other uses and applications are legion: for example, set-topboxes for decoding television transmissions, remote diagnosticequipment, in-car navigation systems and so-called ‘Personal DigitalAssistants.’ Mobile telephones, for instance, can have a system upgradedownloaded to them from any service provider.

[0010] Emulation software packages tend to have certain features incommon, notably that they are not general purpose but are dedicated.They are of most benefit in rapid development areas and have a distinctadvantage in enabling manufacturers to cut costs. In particular, theycan divorce software from the physical machine, i.e., the effect of thesoftware in the physical machine can be altered by the emulatingsoftware without having to go into the machine's native software toimplement those changes.

[0011] The specific object-oriented language used in some of theimplementations described later is that known as Java (registered trademark to Sun Microsystems Corporation). Some of the followingimplementations will enable Java to be used in smaller devices than iscurrently possible because of the improved performance and/or reducedmemory footprint. Future uses projected for embedded software (virtualmachines) include computers worn on the body, office equipment,household appliances, and intelligent houses and cars.

[0012] While it is recognised that there are clear advantages in the useof virtual machines, especially those using object-oriented languages,there are naturally areas where it is important and/or beneficial forsome of the operations that are carried out within the system to beoptimised. These may include reducing the memory requirement, increasingthe speed of operation, and improving the ‘transparency’ of the systemwhen embedded in another system. One of the principal aims of theinventions described herein is to provide a Virtual Machine which isoptimised to work as quickly as possible within a memory constraint of,for example, less than 10, 5, 2 or even 1 Mbyte. Such a constraint islikely to be applicable, for example, to electronics gadgetry and otherequipment where cost (or size) is a major constraint.

[0013] Reference will be made, where appropriate, purely by way ofexample, to the accompanying figures of the drawings (which representschematically the above improvements) in which:

[0014]FIG. 1 shows certain components of the virtual machine.

GENERAL CONSIDERATIONS

[0015] A specific example of a preferred embodiment of virtual machineis now described with reference to FIG. 1.

[0016] The virtual machine 20 is an executable code installed in theparticular item of equipment 22. It can provide a degree of independencefrom the hardware and operating system. The virtual machine maytypically include any, some, or all of the following features: anoperating engine, a library of routines, one or more interpreters, oneor more compilers, storage means for storing a plurality of instructionsequences, queue management means, and buffer management means.

[0017] The virtual machine is coupled to one or more applications 24 onone side (the “high level” side), and, on the other side (the “lowlevel” side), perhaps via various intermediate logical units, to thehardware 26 of the item of equipment. The hardware can be regarded asincluding various ports or interfaces 28 (perhaps an interface foraccepting user input); the virtual machine receives events from thoseports or interfaces. The hardware also includes one or moreprocessors/control means 30 and memory 32.

[0018] Agent's Reference No. 1—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of Operatingthat System

[0019] The present invention relates to a computer system and to amethod of operating a computer system. In particular, the inventionrelates to computer systems including a compiler for compiling code forexecution. In a preferred embodiment, the invention relates to DynamicCompilation of the Dominant Path.

[0020] This invention is preferably related to the optimisation of theruntime representation of object-oriented computer languages by means ofruntime compilation technology and preferably to the optimisation of theruntime representation of object-oriented computer languages by means ofruntime compilation technology. Aspects of the invention are related tooptimised execution of virtual machines, and in particular Java virtualmachines.

[0021] The invention relates in particular to trace scheduling,optimising compilers, dynamic compilation, profile guided optimisations,just in time compilers and the Java VM specification.

[0022] In some applications, for example using the Java language, codemay be interpreted directly using an interpreter. The interpretertranslates the code during execution and thus, the interpretation ofcode can be very slow. The execution of compiled code is thereforepreferred since such execution is generally significantly faster thaninterpretation.

[0023] Standard compilers translate all of the code of an application togive a complete compiled runtime representation of the code forexecution. Such standard compilation is time consuming, especially whereoptimisation of the compiled code is desired, and is usually carried outoff-line before execution of the code.

[0024] The Just-in-Time (JIT) compiler provides on-line compilation ofcode. For example, using a JIT compiler, when a method is firstencountered in the execution of the code, the execution is stopped andthe JIT compiler compiles the whole of the method, optimising wherepossible. Thus the JIT compiler compiles the whole method, includingparts of the method which are unlikely to be used. Such compilationwastes time in compilation and the compiled version of the code takes upspace in the memory. This can present a particular problem for anembedded application where minimising the use of memory is ofimportance.

[0025] Generally, compilers of the runtime representation of computerlanguages and in particular so-called Just-in-time (JIT) compilers,compile the representation of a whole method at a time, or a larger unit(for example, a file or one of many classes at a time). Often asignificant portion of an application relates to handling exceptionalsituations, or rarely executed code. Typically, the compiler blocks anyfurther progress of the application until the compilation completes.

[0026] The conventional compilation approach therefore spends timecompiling code which is rarely executed, and the compiled resultoccupies space which would have not been needed if the rarely executedcode were not present. Optimisation opportunities are often reduced byhaving to cater for control paths through the rarely executed code.

[0027] Offline compilers which use profile input from a previous run ofthe application can often optimise the frequently executed paths of anapplication to mitigate the latter problem. However they still mustcompile every path through the application, and cannot easily react whenan application exhibits different behaviour, to that of the profile run.

[0028] For the JIT compiler, when the ‘invoke’ instruction for a methodis encountered, control is passed to the JIT compiler and, if the methodhas not previously been compiled, a compiled version is created. Thecompiled version is then used for the subsequent execution of themethod. Once the budgeted memory available to the JIT compiler is used,the compilation of new methods is not possible and the use of the JITcompiler ceases. Methods subsequently found will be interpreted, thusslowing subsequent execution of the non-compiled code.

[0029] The amount of memory available to the compiler varies dependingon the computer system used. The overall memory allocated to thecompiler includes the code buffer space, the space allocated to thecompiler for building required internal data structures and for registerallocation. That memory is usually set aside for the compiler prior tocompilation.

[0030] JIT compilers were designed for use on desktop computer systemshaving plenty of memory. The memory allocated to the compiler isgenerally so great that the amount of buffer space available to thecompiler is, in practice, unlimited.

[0031] For embedded systems, however, the amount of memory allocated tothe compiler might be 70 or 80K. Clearly, that imposes constraints onthe amount of code that may be compiled.

[0032] In summary, the Invention described in this application involvesany, some or all of the following features, in any combination:

[0033] 1. Compile fragments of code for the dominant path rather thanwhole methods.

[0034] 2. Use execution history to determine which paths through theapplication are the dominant ones.

[0035] 3. Use a fallback interpreter to interpret infrequently executedcode.

[0036] 4. Have an online compilation system which can compile code ondemand as the application executes. This system does not block progressof the application. The system runs as a separate thread, whose priorityis adaptive.

[0037] 5. Have the ability to incorporate new fragments of code into arunning multi-threaded system.

[0038] 6. Support removal of fragments of code from a runningmulti-threaded system.

[0039] 7. Constrain the amount of memory used by the dynamic compilerduring its execution at any time.

[0040] The invention described in this application aims to, among otherthings, reduce the performance impact of online compilation, generatecode which is optimised for the dominant paths through an application,allow better optimisation of code, within time and memory constraints,reduce the storage overhead of compiled code which is rarely executed,improve application responsiveness in a multi-threaded computer system,and reduce the amount of memory used by the compiler itself.

[0041] According to the present invention, there is provided a computersystem including a compiler for compiling the code of an application,wherein the compiler is arranged to compile a fragment of the code.

[0042] By compiling only fragments of code rather than whole methods, itis made possible only to compile the most desirable sections of code,leaving the less desirable fragments uncompiled.

[0043] By this method, the compilation may be made more efficient asonly those fragments required are compiled. Also, the memory of thesystem need not be filled with compiled versions of rarely executedcode.

[0044] Where reference is made to a fragment of code, it preferablyrefers to a section of code which represents less than a whole method.Preferably the fragment of code includes one or more blocks of code. Itis preferred that the smallest unit of compilation of the code is ablock.

[0045] A particularly preferred feature of the invention is that thefragment of code is a dominant path fragment of the code.

[0046] It will be understood that a dominant path fragment includes afragment including a number of blocks of code which represents apreferred execution route through the relevant code. For example, wherea section of code includes a conditional branch, on repeated executionof code through the branch, one path through the branch is likely to bepreferred over another path through the branch. The fragment of codeassociated with the preferred route through the branch is preferablyconsidered to be a dominant path fragment.

[0047] As indicated below, in some cases, another less preferred routethrough the branch may also be a dominant path.

[0048] In a preferred embodiments of the present invention, the dominantpath fragments of code include code which is frequently executed.Preferably, the dominant path does not include infrequently executedcode. Such infrequently executed code may include, for example, code forhandling infrequently encountered exceptions.

[0049] By compiling only the dominant path, in accordance with apreferred embodiment of the invention, the storage overhead of storingcompiled code which is rarely executed can be minimised. Further,optimisation techniques can be used to optimise the execution of thedominant path code thus increasing the speed of execution of thedominant path code. Further, the compiler need not waste time on-line incompiling rarely executed code and so the overall speed of execution inthe system can be improved.

[0050] In a preferred embodiment of the invention, a fragment of code isconsidered to be part of a dominant path if it is executed more than apredetermined number of times.

[0051] Preferably, the computer system further includes an executionhistory recorder for recording the number of times a fragment of code isexecuted, preferably interpreted.

[0052] Preferably the execution history recorder records the number oftimes a block of code is interpreted.

[0053] In preferred embodiments of the invention, as well as recordinghow many times a particular block has been interpreted, the executionhistory recorder also records further information regarding theexecution of the block, for example, from where the transfer of controlinto the block came and to where control was transferred out of theblock. The recorder preferably also records what type of code wasexecuted in the block.

[0054] Preferably a fragment which has been interpreted a number oftimes which is equal to or greater than a threshold is able to becompiled. Preferably, the threshold is greater than or equal to 2, 5 oreven 10.

[0055] Thus the frequently executed blocks of code are compiled. It isgenerally unpreferable for unexecuted blocks to be compiled. Inpreferred embodiments of the invention, no unexecuted blocks arecompiled.

[0056] Preferably the system further includes a compiler manager and theexecution history recorder is arranged to alert the compiler managerwhen a fragment of code has been interpreted the threshold number oftimes. In preferred embodiments of the invention, the compiler manageradministers a queue of frequently executed blocks for compilation.Preferably the queue is managed in such a way that only the morefrequently executed blocks are chosen from the queue for compilation bythe compiler.

[0057] Preferably, the threshold is able to be dynamically tuned. Forthe example above, in which the compiler manage administers a queue, ifthe queue is persistently long, the threshold is preferably raised sothat fewer blocks are sent to the queue for compilation.

[0058] It is highly preferable for the execution history recorder to bearranged to record during the execution of the application. It ispreferred for the execution history to be collected on-line so that arepresentation of the dominant path for the particular execution of theapplication by the system may be determined and used to generate thecompiled code. In the alternative, when information regarding thedominant path is captured from a previous run, there is a risk thatconditions may have changed from the previous run and the dominant pathof the previous run is not a representation of the dominant path of thepresent run. Furthermore, the dominant path may change during a run.

[0059] Preferably, the system further includes an interpreter forinterpreting the code of the application and the execution historyrecorder is arranged to record the interpretation of fragments of code.It is more efficient for the interpreter to manage the execution historyrecordal. It is envisaged that the recordal of execution of compiledfragments of code could be carried out but in many cases it is thoughtthat it would not be worthwhile having regard to the time and memoryrequired to do so.

[0060] Most preferably, the execution history recorder is arranged torecord a path of execution from a first fragment to a second fragment.Preferably, the path of execution from a first block to a second blockis recorded. In a preferred embodiment, the execution history recorderrecords, for the execution of a particular block, to where control wastransferred from the block. Thus, for a particular block, the mostlikely successor block can be determined. Thus a dominant path from theparticular block can be determined. If the particular block passes thethreshold number of executions and is compiled, a dominant path fromthat particular block through the most likely successors can becompiled.

[0061] Thus, preferably, the compiler is arranged to compile a path offragments.

[0062] Preferably, the system is arranged so that only fragments inwhich all of the code has been executed are able to be compiled. Somesections of code are not always suitable for compilation. If sections ofthe code have not been executed, the unexecuted portions might include“hidden” code which is unsuitable for compilation. Compilation of suchunexecuted code is avoided in preferred embodiments of the invention.

[0063] In embodiments of the present invention, a block of code isunsuitable for compilation if it has not executed all the way to acontrol transfer. As a result, there may still be symbolic resolutionrequired—a job left for the interpreter to implement.

[0064] Preferably, the compiled version of the dominant path exposesonly one external entry point to the rest of the system. Therefore,assumptions may be made in the compilation of the code. Thus thecompiler is preferably arranged to create compiled fragments having onlyone external entry point.

[0065] Where the fragments of code are compiled, preferably the compileris able to optimise the compiled code. Such optimisations might includeinlining. Where compiled code has been optimised, in particular whereassumptions have been made when optimising the code which might laterprove to be untrue or too limiting, preferably the compiled code isassociated with a marker to indicate that a particular optimisation orassumption has been made.

[0066] In preferred embodiments of the invention, several optimisationsare made, in many cases using various assumptions, to produceparticularly efficient compiled code for the dominant path.

[0067] Preferably, the system includes a fallback interpreter.Preferably the fallback interpreter is not used when a compiled versionof code is available, but is used when no compiled version is available,an exception occurs, or an assumption proves false during execution.

[0068] Preferably the system includes an interpreter and at least oneportion of compiled code wherein, on execution of the code, at least afirst portion of the code is executed from compiled code and at least asecond portion of the code is executed from non-compiled code by theinterpreter. Preferably, the system uses a fall back interpreter.

[0069] This feature is of particular importance and may be providedindependently. Thus, a further aspect of the invention provides acomputer system including an interpreter and further including the codeof an application, the code including at least one portion of compiledcode, wherein, on execution of the code, at least a first portion of thecode is executed from the compiled code and at least a second portion ofthe code is executed by the interpreter.

[0070] The interpreter can be used where there is no compiled versionsof the code available or, for example, where assumptions made in thecompilation of the code are found to be untrue. Thus more aggressiveoptimisation is thus made possible to produce optimised code which mightnot be ‘safe’ to use in all cases. Where a case is identified in whichthe compiled version is not safe to use, the fallback interpreter cancomplete the execution of the necessary code without excessivedisruption to the execution and without the need to cease executionwhile a fresh compiled version of the section of code is produced.

[0071] Preferably the system further includes a searching device fordetermining whether there is a compiled version of a fragment of code.Thus, the possibility of time being wasted when an interpreterinterprets a section of compiled code is available, is reduced.Preferably, the compiler is able to compile on-line. Thus the compileris able to create compiled versions for any new dominant path fragmentswhich may appear during a run.

[0072] In a preferred system, the system is multi-threaded. Preferablythe compiler runs on a separate thread to the thread executing code.

[0073] Preferably, the compiler is able to limit the memory which isused by itself and by the compiled fragments. Thus the compilerpreferably has a memory management policy enforced by the compiler tolimit the memory used by compilation. This is of particular importancefor virtual machines which have limited memory. Preferably the systemalso includes a deletion device for deletion of compiled code. Thuscompiled versions of less frequently used code are able to be deleted torelease memory for new compiled code.

[0074] The present invention finds particular application for virtualmachines, in particular in embedded systems. It is envisaged that theinvention could also find general use in systems for which there is thechoice of executing compiled code and interpreting code. The inventionis of particular use in systems having memory constraints.

[0075] The invention also provides a compiler for compiling code in acomputer system, the compiler being arranged for the compilation of afragment of code. Preferably the compiler is arranged for thecompilation of a dominant path fragment of the code.

[0076] Accordingly, the invention provides a computer system containinga compiler for compiling the operating code of an application, in whichonly dominant path (or near dominant path) fragments of the code arecompiled.

[0077] This technique can afford the primary advantage of enhancingperformance and reducing compiled space. It is important for a smallmemory application and involves a mixture of trade offs between memorysize, compilation time and performance.

[0078] In its preferred form, it also enables the use of keyoptimisation techniques, involving loops and inlining, without theoverhead of global dataflow analysis, and hence allows the compileritself to execute much faster than compilers that do perform globaldataflow analysis. The memory usage of the compiler itself is also muchlower.

[0079] In the system as defined, advantageously only the dominant pathof execution is compiled, rather than all the paths through the code,while the remaining paths are interpreted.

[0080] It is a particularly preferred feature that the compiler isoperating on-line, in the sense that as the operating code is runningparts of it are being compiled; what is termed the dominant path may beconstantly changing as execution of the code progresses.

[0081] The invention further provides a method of operating a computersystem, the computer system including a compiler for compiling the codeof an application, wherein a fragment of the code is compiled.

[0082] Preferably, the number of times a fragment of code is executed isrecorded by an execution history recorder.

[0083] In a preferred embodiment wherein the system further includes acompiler manager and the execution history recorder alerts the compilermanager when a fragment of code has been executed a threshold number oftimes, and preferable wherein the execution history recorder recordsduring the execution of the application.

[0084] The invention provides in a further aspect a method of operatinga computer system including an interpreter and further including thecode of an application, the code including at least one portion ofcompiled code, wherein the method includes executing at least a firstportion of the code from the compiled code and executing at least asecond portion of the code using the interpreter.

[0085] Preferably the compiler compiles on line. Preferably the memoryavailable to the compiler is limited and preferably the method furtherincludes the step of deleting compiled code.

[0086] Also, according to the invention, there is provided a method ofoperating a computer system containing a compiler for compiling theoperating code of an application, the method including compiling onlythe dominant path fragments of the code.

[0087] The method can enhance the performance and reduce the compiledspace requirement of the computer system and the memory spacerequirements of the compiler itself.

[0088] Advantageously, information identifying the dominant path isprovided from the execution history of the code. The execution historyinformation is preferably derived dynamically as the program runs. Theexecution history information is advantageously captured from a previousrun of the code.

[0089] In its preferred embodiment, infrequently executed code isinterpreted in a fallback interpreter, whereby preferably execution ofthe code can continue without the need for compiled code for theinfrequently executed code.

[0090] Advantageously, an online compilation system is provided whichcan compile code on demand as the application/program executes wherebycompilation information can be generated in response to the appearanceof a new frequently executed path.

[0091] When the computer system is operating in a multi-threaded system,new fragments of code are preferably incorporated into themulti-threaded system, whereby preferably to achieve smoother operationwithout stopping running threads.

[0092] The invention further provides a method of operating a computersystem containing a compiler for compiling the operating code of anapplication, the method including compiling only the dominant pathfragments of the code.

[0093] Preferably the method includes compiling a fragment of the codeand preferably includes compiling a dominant path fragment of the code.

[0094] The invention also provides the use of a fall back interpreter toexecute infrequently executed code.

[0095] Further provided by the invention is code for a computer system,the code including compiled code produced by a method as aforesaid.

[0096] Any, some, or all of the features of any of the aspects of theinvention may be applied to any other aspect.

[0097] Reference will be made, where appropriate, purely by way ofexample, to the accompanying figures of the drawings (which representschematically the above improvements) in which:

[0098]FIG. 1A shows paths of execution;

[0099]FIG. 1B shows the comparative costs of compiling dominant paths;

[0100]FIG. 1C shows a dispatch table;

[0101]FIG. 1D is a schematic representation of apparatus for carryingout the invention; and

[0102]FIG. 1E shows paths of execution through code.

[0103] The following considerations apply to any and all the inventionsand aspects of the inventions described above.

[0104] 1. Compile Fragments of Code for the Dominant Path Rather ThanWhole Methods.

[0105] A summary of a preferred embodiment is as follows:

[0106] The compiler takes as input the runtime representation of thesource program, and execution history information (which may be obtainedas described below). The execution history information could be live(that is, dynamically changing as the program runs), or captured from aprevious run of the program.

[0107] Execution history information is combined with structuralinformation determined from the runtime representation of the programsource, to establish what is the dominant path of the program thecompiler should compile. Unexecuted code is preferably never included inthe dominant path.

[0108] The compiler treats the dominant path as a super-block fragment,laying the code out sequentially, even though the program source may notbe. Branches and tests are adjusted where necessary to make the dominantpath fall-through. Code and registers are optimised with the assumptionthat the dominant path will be followed to the end. This improvesperformance on modern processor architectures. Critically, the dominantpath only exposes one external entry point. This greatly simplifies andenhances optimisations.

[0109] As shown in FIG. 1A, where the path of execution would leave thedominant path, the appropriate run-time tests are inserted with aforward branch 1000 to some stub code referred to as an “Outlier” 1002.The outlier stub updates any state which the dominant path has notwritten back yet, before transferring control out of the fragment. Themainline code of dominant paths are generally kept together, as are theoutlier stubs as shown at 1002. This improves performance on modernprocessors, especially where branch prediction software/hardwareinitially assumes that forward branches are less likely. It alsoprovides better instruction cache behaviour.

[0110] Compiling dominant paths of execution allows loop optimisationsand inlining to be performed, while simplifying the analysis requiredfor many optimisations. It obviates the need for the compiler to have toresolve symbolic references. That is left to the fallback interpreter.

[0111] For example, when loading a new class symbolic references areused, for example, for fields so that when the first time the referenceis seen it is necessary to load the class hierarchy satisfying thesymbolic references. Where, in a preferred embodiment of the invention,all of the relevant code has been interpreted at least once, thesymbolic references have already been resolved before the code iscompiled.

[0112] Often exceptions need to be recognised in the middle of a loopafter some global state has changed. The exception check can beperformed early outside the loop, forcing the code into the fallbackinterpreter, thus allowing the check to be removed from the loop, andcode motion to be performed in the presence of those exceptions.

[0113] The fallback interpreter will execute the loop and recognise theexception at the right time, albeit more slowly. It is assumed thatexceptions rarely occur, and therefore the benefits of the optimisedloop will outweigh the disadvantages.

[0114] Various optimisations can be made in compiling the code. Theoptimisations may be made at block level or may be more widespread, inparticular where several blocks are involved. An advantage of thepreferred embodiments of the invention is that flow analysis need not becarried out. Registers are preferably used for the compiled code to givefaster execution of the compiled code.

[0115] Where the fall back interpreter is available for use, it ispossible to make various assumptions when compiling the code and to omitseveral safety checks which might otherwise have been required if nofallback interpreter were available. If later any of the assumptions isproved wrong, or if the lack of safety checks would cause something togo wrong, the fallback interpreter can be used to interpret the relevantnon-compiled code.

[0116] When the compiler is being executed online as the application isexecuted, the compilation overheads are often critical. By onlycompiling the dominant path, the compiler is simpler, quicker, and usesless memory for its analysis and therefore can afford to perform moreoptimisations than would otherwise be feasible, especially in a smallmemory system.

[0117] 2. Use Execution History to Determine Which Paths Through theApplication are the Dominant Ones.

[0118] Execution history is captured as the application executes. It ismaintained at the block level, when a transfer of control occurs. It ispreferred for the execution history recorder to record when a block isentered (when the transfer of control into the block occurs). Theexecution history recorder may also record other details relating to theexecution of the block, for example which is the next block (successor)that was executed after the block in question. Thus information aboutthe preferred route of execution through the blocks of code may beobtained rather than only information about individual blocks.

[0119] For each block an entry count and list of successors is kept witha count associated with each. These counts act as an indicator ofpopularity. Execution history records also contain an indication of whatinstruction caused the transfer of control which ends the block. Onlyblocks that have executed up to the transfer of control are candidates.For blocks which have not executed all of the way through, it is notknown what type of code is ‘hidden’ in that part of the block which hasnot been executed. Such hidden code might contain code which requiressymbolic resolution. It is therefore preferred that such blocks are notcompiled. Where the count of the block is made in the execution historyrecorder as the control is transferred from the block, only blocks whichhave executed to the end will be counted. Alternatively, or in addition,checks can be carried out prior to compilation to check whether theblock has executed to the end.

[0120] When memory is constrained, execution history records arerecycled in two ways. Firstly, the list of successors is limited to asmall number, and when a new successor is encountered the least popularexisting successor is replaced with the new one. When there are no freeexecution history records, all of the history records associated withthe least frequently used method are moved to the free list.

[0121] In summary, compilation of a fragment is triggered by the entrycount of a block exceeding a given threshold. The threshold may befixed, or dynamically tuned. However, if the state of the history blockindicates that the block is already queued to be compiled, or is notcompilable, it is ignored. Such a block may not be queued forcompilation.

[0122] In a preferred embodiment, when the code is first executed, noneof the code is compiled. Execution is initially carried out by theinterpreter. As each block is interpreted, the count of the block heldby the execution history is increased by one. The execution historyrecorder records, for each block, from where the transfer of controlinto the block came and to where the control was transferred from theblock. The execution history may also contain further information aboutthe execution of the block, for example the type of code executed in theblock. A threshold is set and when the count for a particular blockreaches the threshold value, the block is entered on the queue forcompilation. The threshold may be 5; when a particular block has beenexecuted 5 times, it is entered on the queue.

[0123] The compiler is associated with a compiler manager which managesthe queue of blocks for compilation. When a particular block reaches thethreshold number of executions, the execution history recorder sends amessage to the compiler manager to enter the block on the queue forcompilation. The compiler is running on a separate thread and checks atintervals to see whether there is an item for compilation in the queueand, at some time, the compiler will start to compile the block referredto at the top of the queue.

[0124] In a preferred embodiment, the queue is managed so that newentries onto the queue are entered at the top of the queue and aretherefore most likely to be compiled. When the queue is managed in thatway, blocks which reach the threshold many times are more likely to becompiled than blocks which reach the threshold only a few times, oronce. So that the queue does not become unmanageable, the compilermanager may delete part or all of the queue from time to time.

[0125] If it is found that too many blocks are being queued forcompilation, the threshold can be raised. Equally, if few, or no, blocksare being queued for compilation, the threshold can be lowered. This canbe carried out dynamically during the execution of the application. Thecompiler manager can monitor the length of the queue and, if desired,send a message to the execution history recorder to increase or decreasethe threshold.

[0126] When the compiler compiles a block which is queued by thecompiler manager, it may proceed to compile just that single block. Itis preferred, however, that the compiler uses the information gatheredby the execution history recorder regarding the successors of the blockand compiles not only the single block which has reached the thresholdbut also the most popular successors of the block, thus compiling themost popular path from the block (the dominant path). It will beappreciated that the successors of the block may or may not have beenexecuted the threshold number of times to be eligible for compilation intheir own right but, nevertheless, are compiled as a part of thedominant path from a block which has been executed the threshold numberof times.

[0127] When the compiler takes a block for compilation, it carries outchecks to determine whether the block is one which is desirable tocompile, for example, if it is able to be compiled, and whether there isalready a compiled version of the block available.

[0128] The compiler then traces the dominant path (though the mostpopular successors of the block) until it gets to the end of the methodor comes across a piece of code which it is not desirable to compile,for example because a compiled version already exists. Other code whichis not desirable to compile would be code which merges back into thedominant path other than at the original block that triggeredcompilation. Flow analysis would be required for optimal compilationotherwise. The compiler detects and prevents such control flow mergesfrom occurring (having determined the likely flow at a branch, theunlikely flow is handled by generating code to exit the fragment). Itwill not pass beyond the end of the method but it will follow, forexample, invokes to follow the dominant path. When the compiler stops inits tracing of the dominant path, it starts to compile the code,starting at the beginning of the dominant path.

[0129] When a compilation triggers, the dominant path can be determinedby following the most popular successors a block at a time, includingfollowing method calls.

[0130] Generally speaking, execution history of the running applicationis a good indicator of which paths are the dominant ones.

[0131] It will be appreciated that, where there are two or more pathsthrough a method, both or all of the paths through the method may bedominant paths and be compiled if the relevant blocks are executedsufficient times.

[0132] Execution history does not need to be accurate, and can beupdated in a number of ways. Rather than track execution history incompiled code, which would slow execution down significantly, executionhistory is maintained by the fallback interpreter.

[0133]3. Have a Fallback Interpreter Which Interprets InfrequentlyExecuted Code.

[0134] Having a fallback interpreter means that when infrequent orexceptional code is executed, execution can continue without thepresence of compiled code for it. The fallback interpreter maintainsexecution history. It also means that all issues to do with classresolution can be solely handled by the fallback interpreter.

[0135] Where only the dominant path of the code is compiled, where thepath of execution leaves the dominant path, interpretation ofnon-compiled code will be necessary. Furthermore, optimisations may havebeen carried out in the compilation of the compiled code and, if it isdiscovered at a later stage that assumptions which were made in theoptimisations were incorrect, the fallback interpreter is used tointerpret the relevant section of code. Also, the run starts executionusing the interpreter before any compiled versions of the code have beencreated.

[0136] It will be seen, therefore, that there are many occasions whereit might be necessary to pass control of execution from the compiledversion to the interpreter and away from the interpreter when compiledcode is available.

[0137] As is described in more detail below for a particular embodiment,while the interpreter is translating code, checks are carried out to seeif there is a compiled version of the code next to be executed. Thusunnecessary interpretation can be avoided.

[0138] Again, as discussed in more detail below, when control is passedto and from the interpreter and between separate pieces of compiledcode, special conversion devices are provided. Examples of such devicesare “glue code” and “outliers”. The conversion devices help to ensurethe smooth transfer of execution between compiled versions of the code.They hold, for example, information regarding the address of code to beinterpreted at the end of a compiled section and are of particularimportance where optimisations have been made in the compiled version toensure that the variables are up to date and are stored on the correctregisters, for example, when the execution is transferred.

[0139] For example, when a jump is made from the compiled code to theinterpreter, the interpreter expects memory state to be current, so if amemory location has been put into a register for the compiled version,it needs to be returned to the correct memory location before theinterpreter proceeds.

[0140] 4. Have an Online Compilation System Which Can Compile Code onDemand as the Application Executes.

[0141] As and when application behaviour changes, a dynamic compiler cangenerate optimised code for any new frequently executed paths which showup. By running as a separate thread, this allows the application tocontinue useful work via the fallback interpreter.

[0142] 5. Have the Ability to Incorporate New Fragments of Code into aRunning Multi-Threaded System.

[0143] Smoother operation is obtained if a new fragment of code can beincorporated without stopping running threads.

[0144] Once the compiler has completed the compilation of the dominantpath for a particular block, it sends a message to the compiler managerthat the compilation has been completed. Until complete, the compiledcode is kept from the executable code. The compiler manager loads thecompiled code in the executable code. The necessary changes are made inthe dispatch tables and code cache to indicate that the compiled code isavailable for the relevant block and where the compiled code is.

[0145] The introduction of the compiled code is carried out atomicallyso that the stopping of running threads is not required.

[0146] 6. Support Removal of Fragments of Code From a RunningMulti-Threaded System.

[0147] Removal of code fragments is also key to being able to operate inrestricted memory environments. It also allows code which was optimisedfor one dominant path to be replaced with different code when newdominant paths appear. Code can be compiled with optimisticoptimisations on the basis that they can be deleted if the optimisticassumptions under which the code was compiled are broken.

[0148] As indicated above, where assumptions made about the dominantpath are found to be incorrect for subsequent execution of the code, thefallback interpreter can be used to interpret a non-dominant paththrough the code. However, if a dominant path which has been compiled issubsequently executed infrequently, it would be desirable to remove thecompiled version of the code to release the memory used by the compiledversion.

[0149] In some embodiments, the number of times of execution of eachpiece of compiled code is monitored and, if it is executed infrequently,can be marked as suitable for deletion.

[0150] In a preferred embodiment, the number of times a code buffer isaccessed is recorded. Before passing control into a buffer, itsexecution count is increased. The least popular buffer may be deletedwhen desirable.

[0151] For example, at a certain point, the compiler may run out of codebuffer space. A buffer is then deleted. If a count has been made of thenumber of times control has been passed into the various buffers, theleast popular buffer may be deleted. Alternatively, the oldest buffermay be deleted.

[0152] It will be appreciated that various checks will usually becarried out before the deletion of the buffer to reduce the risk ofdisruption to the system. See, for example, Agent's reference no. 6 ofthis specification.

[0153] The fact that compilation costs can be radically reduced isillustrated by the schematic diagram in FIG. 1B in which the comparativetime taken up in profiling, compiling and executing at full speed forthe invention 1020 and the typical prior art 1022 are shown as aproportion of a 10-second time slot.

[0154] Use of the dominant path also allows the dynamic compiler to bememory constrained by truncating a fragment some way along the path whenthe compiler reaches its budgeted memory limit. This is impossible inprior-art compilers.

[0155] Thus, when the compiler has used all of its allocated memory, thecompilation of a fragment can be terminated. It will be understood thatsuitable steps would usually need to be taken so that at the end of thetruncated compiled fragment, control can be passed back to theinterpreter so that execution can continue at the correct byte codeaddress and with the correct updated parameters and register structures,where required.

[0156] It is crucial in small memory computer systems that the compileradheres to a memory budget. Prior art compilers typically view memory asan unlimited resource. Hence they may consume large amounts of memoryduring compilation, to build internal representations of its inputprogram, and to hold results of dataflow analysis and the like.

[0157] In contrast, the dynamic compiler works within externalconfigurable constraints imposed upon it at system start up or buildtime. It then compiles as much of a fragment as it can within theseconstraints. If necessary, it truncates the fragment, by relying on thefeedback interpreter to receive control at the truncation point. This isimpossible in prior art compilers, where the unit of compilation is amethod or greater, and where no interaction with a fallback interpreteris available.

[0158] There now follows an example of a run in which execution historyis used to determine a dominant path, the dominant path fragment iscompiled and execution switches between compiled and non-compiled code.

[0159] The system described includes a virtual machine (VM) and includesan interpreter (in C language) and a Java application. The system ismultithreaded and includes a Java main thread, a Compiler Manager threadand a compiler thread.

[0160] For example, the Java application includes Class A:

[0161] Class A static main () { for (i=f; i<100; i++) Aa=newA();a.method(i); }

[0162] The Java thread is started:

[0163] Java A

[0164] class load A

[0165] Class A is loaded and A's dispatch table is loaded. The dispatchtable is shown schematically in FIG. 1C. FIG. 1C shows A's dispatchtable 1030 having various address entries 1032. For example, the mainmethod is located at address 4000.

[0166] The main program of the VM identifies the address of the methodmain A at 4000 and calls glue code:

[0167] call glue (4000)

[0168] Glue code is a part of the conversion device which enables theexecution to switch between the use of the interpreter and the executionof compiled code. Glue code includes several devices for effectingsmooth transfer between the execution of compiled code and non-compiledcode. Glue code includes sections for one or more of:

[0169] 1. updating states of memory locations and register states.

[0170] 2. passing control to the interpreter when no compiled version ofcode is available or optimisations made in compiling code are found tobe inappropriate.

[0171] 3. passing control away from the interpreter when a compiledversion of code for execution is available.

[0172] The conversion device may include outliers as described above forupdating the states. For example, when an exception is encountered inexecution of compiled code, control may pass first to an outlier forstates to be updated before passing to the glue code for instructing theinterpreter to begin executing the code for dealing with the exception.

[0173] The glue code then calls the interpreter to start to execute codebeginning at address 4000:

[0174] call interpreter (4000)

[0175] The interpreter starts at address 4000 and executes the byte codeuntil it reaches the invoke instruction. The interpreter returns to theglue code which determines that the interpreter is trying to perform theinvoke. The interpreter knows where the invoke is in the dispatch table,and tells the glue code.

[0176] The glue code takes the object reference for the method off thestack and looks at the dispatch table to get the address for the method.

[0177] If a compiled version of the start of the method is available,the address of the compiled version will be entered in the dispatchtable, and the compiled version is executed.

[0178] If there is no reference to a compiled version of the start ofthe method, the dispatch table includes an entry for “invoke glue” and areturn is effected to a separate section of the glue code which startsinterpretation of the method at the relevant address:

[0179] call interpreter (5000)

[0180] When the interpreter jumps into the method, it sends a message tothe execution history recorder that the method is about to be executed.

[0181] At the end of the method, there is a return, and the interpreterreturns to the glue code which returns the execution to the previousmethod for interpretation or execution of a compiled version asindicated above.

[0182] The glue code includes a dedicated portion for handling returnswhich ensures that the register, stacks, and so on are correct for theexecution of the next piece of code. For example, where the method hasbeen executed from a compiled version and the next piece of code is tobe interpreted, anything put onto registers for the compiled version hasto be restored into the correct memory location before the next sectionof code is generated. Thus the return handling glue code restores anystates which have been altered as a result of the use of the compiledcode.

[0183] Thus the return to the glue code further returns to the returnhandling glue code before execution passes to the next portion of code.

[0184] The various portions of glue code described above may all be apart of the same piece of glue code, or may be separate glue codepieces. The updating of the states may be carried out by outliers asdescribed above and in Agent's reference no. 3 of this specification.

[0185] A further example below describes the action of the interpreterfor a transfer of control other than an invoke.

[0186] In this embodiment, the following method has just been invokedand is to be executed using the interpreter:

[0187] void func (int p, int a) { int x = p; for (int i=a; i<p; i++) {x=x/i; } }

[0188] The interpreter executes the method in byte code, symbolised innumbered lines as follows: Bytecode: Java: O iload _ 1 x=p;  1 istore _3  2 iload _ 2 i=a;  3 istore 4  5 goto 16  8 iload _ 3 x=x/i;  9 iload4 11 idiv 12 istore _ 3 13 i inc 4 1 i++; 16 iload 4 i<p? - reiterate iftrue 18 iload _ 1 19 if_icmplt 8 22 return

[0189] The method void func is called for the first time. There is nocompiled version so the method starts execution by the interpreter. Atexecution time, the following blocks (groups of lines of code) arerecognised by the interpreter:

[0190] b={0−5}

[0191] b₂={19}

[0192] b₃={8−19} (not a basic block)

[0193] b₄={22}

[0194] The interpreter executes the first block b₁. The interpreter runsan execution history recorder in which it records that b₁ has beenexecuted once and has a count of 1. (Preferably, it also records thatthe successor of b₁, is b₂ and that b₁ was executed all of the waythrough. For simplicity, references to the recordal of such extrainformation is omitted below).

[0195] At the end of the block, the interpreter consults the code cacheto see if there is a compiled version of the next block b_(2.) (Notethat in this example, while there is a transfer of control from oneblock to another, there is not an invoke and thus there is no return tothe glue code. In an alternative embodiment, the interpreter mightreturn to the glue code after every block, but that is likely to be timeconsuming. In the preferred embodiments described herein, theinterpreter only returns to the glue code when

[0196] a. it encounters an invoke,

[0197] b. it encounters a return,

[0198] c. it finds from the code cache that there is a compiled versionof the next block, or

[0199] d. via an exception.

[0200] In this case there is no compiled version, so the interpreterproceeds to execute b₂, giving b₂ a count of 1 in the execution historyrecorder. The interpreter consults the cache again and, finding nocompiled version of b₃, proceeds to execute b₃. For the present example,the loop is repeated 3 times so when a return is made from the method byblock b₄ (going through the return handler glue code as describedabove), the counts of the blocks in the execution history recorder areas follows:

[0201] b₁=1

[0202] b₂=1

[0203] b₃=3

[0204] b₄=1

[0205] If the threshold for compilation is 5, none of the blocks b₁, b₂or b₃ will be queued for compilation.

[0206] After the next time the method void func is called, the countswill be as follows:

[0207] b₁=2

[0208] b₂=2

[0209] b₃=6

[0210] b₄=2

[0211] Thus the execution history recorder sends a message to theCompiler Manager to queue b₃ for compilation. At some later time, thecompiler will consult the queue, and compile b₃. Before compilation, thecompiler determines the dominant path from b₃ using the record for b₃ inthe execution history recorder which indicates the successors of b₃. Inthis simple case, the most popular successor of b₃ is b₃ so that onlythe single block b₃ representing the loop is compiled. The compilationof b₃ may be optimised for example by using registers to store thevalues of p, x, i and a. A pre-exception condition check could beinserted for an i=0 check (division by zero) (see Agent's reference no.2of this specification). When the compiler has completed the compilation,it notifies the Compiler Manager what compilation has been done, wherethe compiled version is and whether it includes a method entry point ornot. The compiled version is not available for execution at this time.

[0212] In due course, the compiler manager will load the compiledversion of b₃. The code cache is updated so that the host code addressfor that part of the method now points to where the compiled code is.

[0213] At a later time when the method func is called, the interpreterconsults the code cache after execution of b₂ and finds that a compiledversion of b₃ is available.

[0214] The interpreter returns to the glue code which, as describedabove, effects the execution of the compiled version of b₃.

[0215] At a later time still, the method func will have been executed 5times so that b₁ and b₂ are queued for compilation.

[0216] When b₁ is taken for compilation, the compiler will determine thedominant path from b₁. The successor of b₁ is b₂ (the compiler does notconsider b₃ for compilation as part of the dominant path on thisoccasion because there is already a compiled version).

[0217] The fragment b₁ and b₂ is compiled and the dispatch table isupdated.

[0218] On a subsequent execution, the compiled code for b₁/b₂ isexecuted, a return is made to the glue code, which effects execution ofthe b₃ compiled code. If the path from compiled b₁/b₂ to the glue to thecompiled b₃ is effected a sufficient number of times, a patch connectingthe compiled b₁/b₂ to compiled b₃ may be made. (Patching is described inmore detail under Agent's reference no. 12 of this specification). Thusthe execution can be made more efficient because the step through theglue is no longer required.

[0219] At a later time, a memory manager associated with the compilermanager decides that memory for the compiler should be freed. The oldestbuffer chosen for deletion includes the compiled version of b₃. Thecompiler manager calls the deleter to delete the buffer. Certain checkshave to be carried out before deletion (see for example Agent'sreference no. 6 of this specification). In the example given above,there is a particular problem because a patch was inserted between thecompiled code for b₁/b₂ (which is not deleted) and the compiled code forb₃ (which will be deleted). For a discussion of how this problem may beovercome, see Agent's reference no. 12 of this specification).

[0220]FIG. 1D shows apparatus 1040 suitable for carrying out theembodiment described above.

[0221] The apparatus 1040 includes an interpreter 1042 for interpretingJava code 1043 in the computer system. When the interpreter reaches theend of a block of code, unless there is an invoke or a return, itconsults the code cache using the code cache searcher 1044 to see if acompiled version of the next block is available. If there is, theconverter device 1046 (which includes the glue code referred to above)carries out the necessary changes and alterations before passing controlto an executer 1048 for executing the compiled version 1049 of the code.

[0222] As interpreter 1042 executes, it records in the execution historyrecorder 1050 which blocks of code have been executed as well as furtherdetails about the execution of the block, for example which blocks wereexecuted before and after the block and what type of code was executed.

[0223] The execution history recorder 1050 notifies the compiler manager1052 when a block is executed a threshold number of times. The block isheld in a queue 1054 managed by the compiler manager 1052. A thresholdtuner 1056 monitors the length of the queue from information from thecompiler manager 1052. Based on information regarding the length of thequeue, the threshold tuner 1056 alters the threshold for the executionhistory recorder 1050 to send a block to the compiler manager.

[0224] A compiler 1058 compiles blocks referred to in the queue 1054.The compiler 1058 uses information from the execution history recorder1050 regarding the execution of the block to determine the dominant pathfrom the block and prepares a complied version of the code. When thecompiled version is complete, the compiler 1058 notifies the compilermanager 1052 which updates the necessary dispatch tables and code cachesand loads the compiled version.

[0225] The compiler manager 1052 includes a memory manager 1060 whichmonitors the memory available to the compiler 1058. If memory availablebecomes low, the memory manager 1060 instructs a deleter 1062 to deletesome of the compiled code. Also, if the queue 1054 becomes too long, thecompiler manager 1052 instructs the deleter 1062 to delete some or allof the queue 1054.

[0226]FIG. 1E shows paths of execution through code of a methodgenerally referred to as 1066.

[0227] The figure shows schematically various fragments of code, forexample 1068, 1070, 1072. Such fragments of code may each represent ablock of code.

[0228] The code shown in the Figure has one external entry point 1074.After block 1072, there is a conditional branch 1076, for example anexception check. If an exception occurs, the execution passes along pathA to code 1078 to handle the exception. Otherwise, code passes alongpath B to code block 1080 at which point there may be a call (path C toblock 1082) or the execution may follow path D to code sections 1083,1084. Execution may pass along path E to block 1085 or path F to block1086.

[0229] Information about execution runs through the code 1066 isrecorded on the execution history recorder 1050 run by the interpreter1042.

[0230] If block 1068 is found to have been executed by the interpreterthe threshold number of times, it is passed to the queue 1054. Thecompiler 1058 consults the execution history in the recorder 1050 andfinds that:

[0231] 1. The more popular successor of 1072 is 1080 (that is, executionpassed along path B more often than along path A);

[0232] 2. The more popular successor of 1080 is 1083 (that is, executionpassed along D more often than along C); and

[0233] 3. The more popular successor of 1084 is 1085 (that is, executionpassed along D more often than along C).

[0234] The compiler 1058 determines that the dominant path is therefore1068, 1070, 1072, 1080, 1083, 1084, 1085 through the code. The dominantpath is indicated as 1088.

[0235] While the compiler 1058 was tracing the dominant path 1088, itnoted that fragment 1084 was never executed all the way through (path Fwas never followed). Thus, 1084 is not a suitable candidate forcompilation and the dominant path fragment for compilation does notinclude fragments 1084 or 1085.

[0236] Thus the compiled dominant path fragment includes fragments 1068,1070, 1072, 1080 and 1083.

[0237] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that many ofthese features may be implemented using hardware or a combination ofhardware and software. Furthermore, it will be readily understood thatthe functions performed by the hardware, the computer software, and suchlike are performed on or using electrical and like signals.

[0238] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[0239] In any or all of the aforementioned, the invention may beembodied in any, some or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[0240] As used herein throughout the term “computer system” may beinterchanged for “computer”, “system”, “equipment”, “apparatus”,“machine” and like terms. The computer system may be or may include avirtual machine.

[0241] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[0242] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[0243] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[0244] Agent's Reference No. 2—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of Operatingthat System

[0245] The present invention relates to computer systems and to methodsof operating computer systems. In particular, the invention preferablyrelates to a computer system including a compiler for compiling code andto a method of compiling code in a computer system. Preferably theinvention relates to computer systems running interpreted languages, forexample Java. The invention preferably relates to object-orientedprograms (preferably Java). In a preferred embodiment, the inventionrelates to pre-exception condition checks.

[0246] In order to avoid problems arising during the course of a programor method execution in an object-oriented program such as Java, safetysystems are normally built in which will detect an impermissiblesituation and throw an error and/or an exception. The system willusually respond to the exception condition being detected and will ceaseexecution in the area where the exception has been detected. In somesuch systems, an exception handler will be invoked in order to handlethe exception, for example to close down an illegal operation, beforeallowing the execution to continue.

[0247] Java throws both errors and exceptions. For simplicity, thesewill be referred to herein as ‘exceptions’. It should be understood thatthe term ‘exception’ used herein is to be interpreted broadly toinclude, for example run-time errors, exceptions and other occurrencesthat occur in the Java language and/or in other languages, unless clearfrom the context otherwise.

[0248] Java is a language which is rich in exceptions. Java also hasvarious mechanisms for dealing with exceptions when they occur.

[0249] For example, a section of code may include the term ‘y=i/z’. If,when the code is executed, z=0, a ‘divide by zero’ exception is thrown.When compiled, the method containing the possible exception is marked tothrow an exception.

[0250] If a method is invoked in Java which has declared itself to throwan exception, then the Java compiler requires that any method whichinvokes that method also to declare an exception or to provide anexception handler to deal with the exception. Thus the exception canripple up the call chain until it is either caught and dealt with by anexception handler or falls off the end of the chain. This will be wellunderstood by those familiar with the Java language who will alsoappreciate that there are essentially two types of exceptions in Java,namely ‘checked’ and ‘unchecked’.

[0251] A ‘checked’ exception will either be ‘caught’ or ‘thrown’. Indeedthe compiler will force a checked exception to be caught or thrown. Incontrast, an ‘unchecked’ exception is more like a runtime error, such asdivide-by-zero, and neither Java nor C++ forces declaration of a throw.

[0252] Consider the situation where a stack is formed in which aparticular exception, such as divide-by-zero, is declared in theuppermost, or oldest, frame a whilst the most recent frames b, c, d andso on are regarded as being added in sequence below frame a. If theexception is encountered in frame d, the evaluation stack for that frameis cleared, the VM creates an exception object and a reference to itwill be placed on the evaluation stack of the frame with a matchinghandler.

[0253] The object reference indicates the type of exception and goes toa table for instructions (assuming there are any for that exception) onhow the exception is to be handled. For example, the table mightindicate that if the exception occurs in any of lines 1-20, it will behandled in line 21.

[0254] When the exception in d is encountered, first frame d is searchedfor the handler but, since the exception is declared in frame a itclearly will not be found so frame d is wiped and the search continuesin frame c. The same situation obtains in c, so the search continuesbackwards, wiping each of d, c and b in turn, until frame a is reachedwhere the handler can be located. It should be emphasised that onlylocal variables are stored in the wiped frames, so there is no loss ofvaluable information; all global variables (called arrays, objects,static fields in Java) and objects created in other programminglanguages (for example) remain stored in the heap.

[0255] Java is a language rich in exceptions. Java state must be writtento as dictated by the semantics of the Java program.

[0256] When the Java program is compiled, however, it is possible tomake various optimisations. One such optimisation might be possible inthe case where the fragment of code to be compiled includes a loop. Itis desirable to move any loop invariable operations outside the loop tomake execution at run-time more efficient. However, that can give riseto difficulties where an exception may occur within the loop. Thus, inthe following simple example, one cannot update “x” before the arrayaccess is executed, in case the array access “arr[i]” raises an “indexout of bounds” exception. If the write to “x” was incorrectly movedbefore the access, and an exception did occur, we would now have anincorrect value for “x”. x = . . . ; b = 10 for (int i=a; i<b; i++) {arr[i]++; x = b }

[0257] Standard code-motion optimisations, such as loop invariance, arethus blocked in the presence of such exceptions, which act as barriersacross which code cannot be moved.

[0258] In the above example, “x” is being written with a loop-invariantvalue (10). In the presence of the potential exception, we cannot movethe write outside of the loop. If “a” did not fall within the range ofallowable index values for the array “arr”, then the first access to“arr[i]” would raise an exception and “x” would have the same valueextant at entry to the loop, and not the value 10. Moreover, theexception check itself executes within the loop body, hence incurringits own execution penalty.

[0259] If optimisations are to be made in the case of the compilation ofthe code of the above example, it would be necessary to carry out ananalysis to prove that ‘i’ could never fall outside the range ofallowable index values. If that can be proved, then the write to x couldbe safely moved outside the loop. In order to prove the necessaryconditions, complex analysis of the code would be required. In somecases local analysis might be sufficient, for example where it can beshown from an analysis of a basic block of a single method that theexception would not occur. In most cases, however, it will be necessaryto look at several blocks, for example back to the block in which thearray was created to be able to make the proof. In that case, globaldata flow analysis (the analysis of an entire single method) orinterprocedural analysis (the analysis of the entire program or class)would be required. Clearly, such analysis is time consuming and costlyin memory usage and could really only be contemplated for use inoff-line compilation. In any case, if it is found as a result of thedetailed analysis that the exception might occur, optimisation would inany case not be possible. Thus, such analysis is rarely done in practiceat runtime on limited memory systems and optimisations of code in whichexceptions may occur are usually not attempted.

[0260] Another example involves an exception condition covering thesituation where a point may be reached in a division step where thedenominator equals zero.

[0261] This example involves division of a variable x by anothervariable i. There may be certain circumstances where i becomes zero,leading to division by zero, a non-calculable function, such as follows:int x=p; b=10; for (int i=a; i<b;i++){ x=x/i; y=b; }

[0262] It is not advisable to throw the exception too early for fearthat the program loop may have executed something which is of value. Itis not impossible for a loop including a possible exception to becirculated a large number of times (perhaps on average 10 times) beforethe exception is raised.

[0263] Thus, while it would be desirable to remove the loop invariantterm out of the loop to save time at run-time in the repeated executionof the loop, it would not be safe to move the term out of the loopwithout having carried out detailed analysis.

[0264] The present invention seeks to mitigate this and/or otherproblems.

[0265] According to the invention, there is provided a method ofcompiling a fragment of code including a possible exception, the methodincluding the step of including a pre-exception condition check.

[0266] The pre-exception condition check is preferably included in thecompiled version of the fragment of code. By using a pre-exceptioncondition check, it can be determined early on before the code whichmight raise an exception is executed, whether an exception will occur.If the check shows that no exception will occur, it will then be safe toexecute the code including the exception.

[0267] It will be understood that the pre-exception condition check willpreferably be included immediately before the body of the fragment ofcode in which the exception might occur. Thus, code other than thepre-exception check can be optimised. Preferably, the condition check isincluded at the beginning of the compiled fragment. That is especiallypreferred where the fragment contains a loop.

[0268] Preferably, the fragment of code is compiled on the assumptionthat the exception will not occur. When the pre-exception check is used,if the check is passed it is known that the exception will not occur.Thus optimisations may be made which would not have been safe to make ifit were not known whether or not the exception would occur. Thus thecompiled code can be more efficient, both in terms of the increasedspeed in executing the code as well as being more compact, thusoccupying less memory space.

[0269] Preferably, the method includes providing a bailout device foruse if the condition check determines that an exception will occur. Inmany cases, the pre-exception condition check will determine that noexception will occur and execution of the compiled code can proceed.However, in some cases, an exception will occur and the condition checkwill determine that an exception condition is imminent. The bailoutdevice preferably allows the exception to be encountered in theinterpreter at the expected point of execution.

[0270] Preferably, if the original code fragment included code fordealing with the exception, that code is not included in the compiledversion as a part of the optimisation procedure. In any case, the codeis preferably compiled so as not to be cluttered with code for use inthe further detection and handling of exceptions which occurinfrequently. Rather than the code for dealing with exceptions beingcompiled, therefore, preferably an interpreter is used to interpretuncompiled code for handling the exception. Preferably, the bailoutdevice is arranged to pass control to an interpreter. The control isforced to pass to the interpreter because, since there is a compiledversion of the code, the interpreter would normally not be used forexecution of that code.

[0271] Thus, in effect, the compiled version of the code is preferablyprepared only for use when the exception does not occur and is compiledso as to optimise the compiled code for that situation. Where theexception does occur, the compiled code is preferably not used and theinterpreter is used to execute up to the point of detecting thecondition, and raising the exception. It would be possible to providetwo versions of the compiled code: one for use in the case where theexception occurred and one for use where the exception did not occur,each version of code being optimised for the relevant situation. In manycases however, that would be undesirable, especially where the systemwas one having limited memory (for example a VM). The compiled versionof the code for use where an exception occurred would be infrequentlyused and would clutter up the memory allocated for compiled versions ofcode.

[0272] Where the compiled code has been optimised, it is possible thatthe condition of states, (for example the values of integer variablesand the register states) when the condition check reveals that theexception will occur, is not the same as for the correspondinguncompiled code. Preferably, the bailout device includes an outlier forupdating states.

[0273] Preferably, the fragment is a dominant path fragment of code.Preferably, at least part of the code forms a loop. In particular wherethe memory available is limited, as in a virtual machine, it is highlypreferable not to compile code which is infrequently executed.Preferably, the method also includes the step of determining a dominantpath through the code. Preferably, infrequently executed code, forexample non-dominant path fragments of code are not compiled.Preferably, the compiler compiles only dominant path fragments of code.

[0274] According to the invention there is further provided the use of apre-exception condition check in compiled code.

[0275] The invention also provides a compiler for compiling codeaccording to the method described above.

[0276] Also provided by the invention is an apparatus for compiling afragment of code including a possible exception, the apparatus includingmeans for including a pre-exception condition check.

[0277] The apparatus is preferably a part of a computer system,preferably a virtual machine. The invention relates in particular tointerpreted languages, and has particular relevance to Java.

[0278] Preferably, the compiler is arranged to include the conditioncheck at the beginning of the compiled fragment and preferably thecompiler is arranged to compile the fragment of code on the assumptionthat the exception will not occur. This is of particular relevance wherethe fragment includes a loop.

[0279] Preferably, the apparatus includes a bailout device for use ifthe condition check determines that an exception will occur. The bailoutdevice is preferably provided on compilation by the compiler.

[0280] Preferably, the apparatus further includes an interpreter and thebailout device is arranged to pass control to the interpreter.Preferably, the interpreter is arranged to interpret the code forhandling the exception.

[0281] Preferably, the bailout device includes an outlier for updatingstates. In particular, where control is relinquished from the executionof compiled code and, it will often be necessary to update states beforethe control is passed.

[0282] Preferably, the fragment is a dominant path fragment of code andpreferably the compiler is arranged to compile the dominant path code.Preferably, the compiler is arranged to compile only dominant pathfragments of code. Preferably the compiler is an on-line compiler. Theexecution time impact of the compiler and the amount of memory that ituses can be reduced if the compiler only compiles dominant pathfragments of code.

[0283] The invention also provides code compiled using a methoddescribed above.

[0284] According to the invention, there is also provided code for acomputer system, the code including a fragment of compiled codeincluding a possible exception, the code further including apre-exception condition check.

[0285] Preferably, the code further includes a bailout device for use ifan exception is indicated and preferably, the bailout device includesmeans for forcing a transfer of control to an interpreter.

[0286] Also provided by the invention is a computer-readable storagemedium having structured data recorded thereon including code asdescribed above, and also a computer-readable storage medium having aprogramme recorded thereon for carrying out a method as described above.

[0287] Further provided by the invention is a computer system whenprogrammed with a method as aforesaid, and a computer system whenprogrammed according to a method in which a fragment of code including apossible exception is compiled, the method including a pre-exceptioncheck.

[0288] The invention aims to allow optimisations relating to code motionin the presence of exception conditions within loops, which in turnimproves the execution speed of the resulting compiled fragment.

[0289] The solution is achieved by use of “pre-exception conditionchecks”, whereby the compiled fragment contains equivalent checks placedprior to the loop entry point.

[0290] Advantageously, such a check critically relies upon the presenceof the fallback interpreter. If the check detects an exceptioncondition, then control reverts to the fallback interpreter without thepossibility of re-entering the fragment at this loop entry point. Thefallback interpreter continues execution at the loop entry point, andhence executes up to the point where the exception is encountered at itscorrect control point, thus raising the exception with all Java statescontaining the correct values. If the pre-exception condition checkpasses however, then the fragment is safely usable, and any code motionoptimisations are valid.

[0291] In the above example therefore, one could have moved theloop-invariant assignment of “x” out of the loop, so long as it followsthe check. This allows omission of the original exception check in theloop, which also offers improved performance.

[0292] Preferably all pre-exception condition checks are effectedoutside any execution loops, to reduce any time penalty of execution ofthe checks (in particular where the loop may be repeated a large numberof times).

[0293] Preferably the compiled code includes several pre-exceptioncondition checks, to check for several possible exceptions. Such checksmay be arranged as a collection of individual checks, or may include asingle check which determines whether any of a number of exceptionconditions exists.

[0294] Preferably, the computer system includes a virtual machine. Themethod of the invention finds particular application in the context of avirtual machine (VM). A VM requires a small memory footprint in embeddedsystems, and the present invention allows the footprint of the compiledversion of code in the virtual machine to be reduced.

[0295] The invention finds particular application for interpretedlanguages, where an interpreter may be used, and in particular the Javalanguage. The interpreter can be used as a fall back for when anexception is indicated. If the interpreter were not present, a number ofdifferent compiled versions of code might have to be provided to dealwith alternative routes through the code, for example in the presence ofexceptions. Such an arrangement might reduce, or indeed cancel, anybenefit in reduced memory space occupied by compiled versions of thecode.

[0296] There is likely to be a balance between the number of checkswhich can be inserted into the compiled version of code (the checksincurring a time penalty at execution) and the benefit of reducedexecution time in execution of the optimised compiled code.

[0297] The benefits of the invention may include increased safety inexecution (by use of the condition checks), preferably without incurringincreased execution time and memory penalties.

[0298] A further advantage of the invention is the choice of the fast(unchecked) route through the compiled fragment or the slow (exceptiondetecting route through the fallback interpreter. The invention enablesthe fast route to take advantage of code motion (including exceptioncondition checks) outside of a loop, even in the presence of exceptionconditions within the loop. This choice is unavailable to priorcompilers which have compiled the entire method and whose compiledmethods do not have the ability to interact with an interpreter to fieldexception conditions.

[0299] By virtue of the invention, the performance of the compiledfragment may be greatly improved due to the ability to move code out ofloops. Hence greater freedom is available to the dynamic compiler in itschoice and application of optimisations which are not normally availableto prior compilers.

[0300] According to alternative aspects of the invention, there isprovided a computer system including (preferably during the running of aprogram) means for compiling an exception check to identify theoccurrence of an exception condition, and means for executing anexception, when identified by the exception check, in an interpretedlanguage.

[0301] Optionally there may also be provided means for carrying out anexception check to identify the occurrence of an exception condition.

[0302] In another aspect, the invention provides a method of operating acomputer system including the steps of: running a program; compiling anexception check to identify the occurrence of an imminent exceptioncondition, and executing an exception, when identified by the exceptioncheck, in an interpreted language.

[0303] Preferably, the exception check is carried out outside aprocessing loop, whereby preferably to avoid the need for the exceptioncheck to be carried out at each circulation of the loop. An advantage ofthe invention is the choice of taking the fast (unchecked) route throughthe compiler or the slow (exception detecting) route through theinterpreter which is not available to prior compilers off-line.

[0304] It may be possible, according to the invention, to decide outsidethe loop that the exception will be reached at some future point intime. When that occurs, control is passed off to the interpreter andtherefore there is no necessity for the exception to be checked in eachcirculation of the loop.

[0305] The exception check itself is compiled but interpretation of theexception itself in the slower interpreter serves to save compilationtime and, in particular, reduce memory requirements by not havingmultiple compiled versions of code for dealing with possible exceptions,but does not prejudice optimisation. Indeed, optimisation can bepositively enabled. (In Java, exception handling is carried out aprogramming level.)

[0306] Any, some or all of the features of any aspects of the inventionmay be applied to any other aspect.

[0307] The following considerations apply to any and all the inventionsand aspects of the inventions described above.

[0308] Preferred embodiments of the invention will now be described,purely by way of example, having reference to the accompanying figuresof the drawings (which represent schematically the improvements) inwhich:

[0309]FIG. 2A shows apparatus for carrying out the method of theinvention;

[0310]FIG. 2B shows a fragment of code including an exception; and

[0311]FIG. 2C shows a compiled fragment of code in accordance with thepresent invention.

[0312] Consider the following example:

[0313] A method is called:

[0314] invoke func (20,200)

[0315] The method func: void func (int p, int a) { int x=p; int b=10;int y; for (int i=a; i<b; i++){ x=x/i; y=b; } }

[0316] It will be seen that an exception will occur if i=0 and a divideby zero is attempted. Previously, it would not have been possible tomove the loop invariant code (y=b) out of the loop because, if theexception occurred, the write to x would be affected.

[0317] When the method func is first invoked, it will be executed by theinterpreter. If an exception occurs, it will be dealt with in the normalway and, because the code is being interpreted, the write to x will onlyoccur if the exception does not occur. In accordance with a preferredaspect, if fragments of the code of the method func are executedsufficient times by the interpreter such that the fragments areconsidered to be dominant path fragments of the code, they are queuedfor compilation. A detailed discussion will be found in Agent'sreference no. 1 of this specification.

[0318] From that discussion, it will be seen that it is likely that theloop will be compiled first, and that the dominant path for the loopincludes only the block or blocks including the loop.

[0319] As explained in Agent's reference no. 1 of this specification,the repeating loop represents a third block b₃. The byte code (astranslated by the interpreter) can be symbolised as follows (theequivalent Java instruction being indicated): Bytecode Java O iload_1x=p;  1 istore_3  2 sipush 10 b=10;  4 istore 5  6 iload_2 i=a;  7istore 4  9 goto 21 12 iload_3 x=x/i; 13 iload 4 15 idiv 16 istore_3 17iload 5 y=b; 19 istore 6 21 iinc 4 1 i++; 24 iload 4 i<p ?. Reiterate iftrue 26 iload_1 27 if_icmplt 12 30 return

[0320] Block b₃ is represented by lines 12 to 27 of the bytecode. Whenblock b₃ has been executed sufficient times, it will be queued forcompilation.

[0321] The compiler sees that there is a possible ‘divide by zero’exception in the block b₃. A pre-exception condition check is insertedinto the compiled version of the block b₃. In the present case, thecheck is inserted at the beginning of the compiled fragment. (It could,of course, be inserted at any point before the exception might occur inthe compiled code. Where the exception could occur within a loop,preferably the check is inserted prior to the entry point of the loop.Often, as in the present example, the entry point of the loop will, inany case, be the start of a dominant path fragment.) The compiler willalso see that the block b₃ includes a loop invariant term y=b, and thatan optimisation can be carried out to remove the loop invariant termfrom the loop.

[0322] A compiled version of block b₃ might be, for example, as shown inthe left-hand column below (given in simplified code for clarity). Anindication as to the step performed by each section of compiled code isincluded in the right-hand column. Compiled code Step performed cmp i, 0compare i with zero ble glue_bailout if i is less than or equal to 0, goto the glue code load r_(a), b load b into the register store r_(a), y y= b (loop invariant step) load r_(n), i load registers for start of loopload r_(m), x div r_(s), r_(m), r_(n) x/i and store result in register sadd r_(n), 1 i++ cmp r_(n), r_(a) i<b blt if i<b, repeat the loop (fromdiv r_(s), r_(m), r_(n))

[0323] The first two lines of the compiled code above include thepre-exception condition check. If i is greater than zero, the checkpasses and the remainder of the compiled code is executed (from thethird line). If i is less than or equal to 0, the second line of codetransfers the execution to the glue code of the bailout device asdescribed below. The remainder of the compiled block b₃ is not thenexecuted. Note that the interpreter then interprets the loop from thestart of the loop body, through to the point where an exception isdetected. Thus the check in the compiled code is giving an early warningof an imminent exception rather than an immediate one. In some case thiscan reduce the number of steps carried out in the compiled code whichhave to be “undone” before control is transferred to the interpreter.

[0324] It will be seen that various optimisations have been made in thecompiled version of the loop. In particular, the loop invariant term y=bhas been moved outside the loop. That would not have been safe to do ifthere had not been a pre-exception condition check present.

[0325] The above example has been simplified. In practice there may alsobe an ‘index out of bounds’ pre-exception condition check (either beforeor after the i is less than or equal to 0 check), for the situationwhere i is out of bounds for the execution of the loop. Thus, eachsection of compiled code may have several pre-exception conditionchecks. Examples of types of pre-exception condition checks arediscussed below.

[0326] For a detailed discussion of the execution of code includingcompiled and non-compiled fragments see Agent's reference nos. 1 and 3of this specification. A summary of some of the steps is given here forthe above example in the case in which the condition check determinesthat there is an exception condition.

[0327] The first line of the compiled code is executed to check if i isless than or equal to 0. If it does, the second line of code directs theexecution to a specific entry point of the glue code. The glue code thenforces control to pass to the interpreter. The glue code tells theinterpreter at which address to start to interpret code (and not toconsult the code cache before executing (because the code cache willcontain a reference to the compiled version and in this case thecompiled version cannot be used)). The glue code indicates to theinterpreter to recommence execution at the beginning of the non-compiledversion of the block b₃ (from iload_(—)3, see above). The interpretersees the exception at the correct time and it is dealt with accordingly.(The interpreter cannot raise the exception too early.)

[0328] Once the interpreter has executed the fragment including theexception, the control may pass back through the glue code for theexecution of a compiled version of code as discussed in Agent'sreference no. 1 of this specification.

[0329] Equally, where an ‘index out of bounds’ pre-exception conditioncheck is inserted, if the relevant check fails, control is passed to theglue code, and to the interpreter.

[0330] A separate pre-exception condition check could be used for anyexception which could occur in the code to be compiled. Onepre-exception condition check could be used to check for severalpossible exceptions.

[0331] A suite of such pre-exception checks are available for use,including early typecast check, early bounds check against the possiblerange of array index values, early null-reference check, early divide byzero, and early object type check, to enable code motion and other earlychecks to be applied to inlined methods.

[0332] A checkcast check proves whether or not an object of a given typecan be stored in a field for that type—for example, the check couldanswer the question whether a ‘graphics’ type object could be stored ina ‘car’ type object.

[0333] Java (and other object oriented languages) has a hierarchicalstructure for classes where if Class A extends a Class O and Class Bextends a Class O then Class A and Class B are not related. Conversely,if Class A extends Class O and Class B extends Class A then Class B is asubclass of A and the system could use B where it uses A. Thus it willbe seen that there is scope for an exception to arise where thehierarchy of objects in a section of code is not appropriate.

[0334] The checkcast condition check checks to see that the classrelationship is correct and, if the checkcast check fails, controlpasses to the bailout device.

[0335] A bounds check, as the name implies, proves whether the arrayindex is within the permitted limits, that is, the bounds, of the array,otherwise it refers to the bailout device (glue code) to raise theexception for the index being out of bounds. An example is given aboveof a situation in which an ‘index out of bounds’ exception might beraised.

[0336] A null-reference check identifies whether a field reference isnull, in which case nothing can be done with that field.

[0337] As an example, consider the following steps:

[0338] aload s

[0339] //push reference for an object onto stack

[0340] getfield

[0341] At this stage the ‘getfield’ loads the specified field from theobject. If the situation arises:

[0342] aload s

[0343] getfield (class X, field Y)

[0344] then if s is null, nothing further can be done and an exceptionmust be raised by the getfield. The pre-exception condition checkdetermines whether there will be a null. If so, the bailout device iscalled.

[0345] A divide-by-zero check, as has already been discussed in theexamples above, determines whether a situation will or may be reachedwhere the denominator of a divider function becomes zero, anuncomputable function.

[0346] An object type check can best be described as a check to ensurethat objects are fitted into the hierarchical structure of anobject-oriented system with the correct implementation of methods.

[0347] As an illustration of this check, consider the situation where amethod might call draw where draw is a method for drawing an object ofthe Graphics class. If there is no subclass of graphics at that stagewhich includes a different implementation of draw, it can been assumedthat the method draw is final and will not be overridden by a new drawmethod. Thus, it is assumed that the draw method is not polymorphic,even though it is potentially polymorphic. The code can be compiled withthe assumption that the draw method is final. Optimisations can be madebased on that assumption, for example inlining of the method draw intothe code. See Agent's reference no. 9 of this specification.

[0348] The object type check is made to determine whether the calledmethod can appropriately be implemented on the relevant object. In thepresent example, the check will determine whether the object is agraphics type rather than anything else and whether the draw method isappropriate for the object.

[0349] Apparatus for carrying out the method of the present invention isshown schematically in FIG. 2A. The apparatus includes an interpreter2000 for interpreting code. An execution history recorder 2002 recordsdetails of the execution of the code by the interpreter 2000. When ablock of code is executed a predetermined number of times, the executionhistory recorder 2002 notifies the compiler manager 2004 whichadministers a queue of blocks for compilation. The compiler 2006consults the queue and takes blocks for compilation, determines thedominant path from the records of the execution history recorder 2002.The compiler also determines whether there are any possible exceptionswhich may occur in the dominant path fragment to be compiled. If so, thenecessary pre-exception condition checks are inserted at the beginningof the compiled fragment of code. The compiler 2006 compiles thefragment and sets up any necessary links to bailout devices 2008. Thecompiled code is executed by the execution device 2010. If thepre-exception condition check indicates that an exception will occur,the bailout device 2008 transfers to glue code 2014 which passes controlto the interpreter 2000 for execution of non-compiled code relating tothe exception.

[0350]FIG. 2B shows a section of uncompiled Java code 2100. Code section2100 would be executed using the interpreter 2000.

[0351] The section 2100 includes a loop 2102. Within the loop 2102 is apossible exception 2104 (for example a division which might result in a‘divide by zero’ exception). The loop 2102 also includes a loopinvariant term 2106 which it is desired to move out of the loop toincrease the speed of execution of the loop 2102.

[0352] After several executions of the code 2100, it is found that thecode fragment forming the loop 2102 is a dominant path fragment of codeand it is queued for compilation. FIG. 2C shows the compiled version ofthe code fragment (indicated generally as 2108). The compiled codefragment 2108 includes a pre-exception condition check 2112 to check tosee whether the exception will occur. The compiled version stillincludes a loop 2114 but, due to optimisations made in the compilation,it is smaller than before, and quicker to execute. The loop invariantterm 2116 has been moved out of the loop 2114, to increase the speed ofexecution. The pre-exception condition check 2112 includes a path 2118to a bailout device 2008 for the case in which it is found that anexception will occur.

[0353] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike are performed on or using electrical and like signals.

[0354] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[0355] In any or all of the aforementioned, the invention may beembodied in any, some or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[0356] As used herein throughout the term ‘computer system’ may beinterchanged for ‘computer’, ‘system’, ‘equipment’, ‘apparatus’,‘machine’ and like terms. The computer system may be or may include avirtual machine.

[0357] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[0358] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[0359] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[0360] Agent's Reference No. 3—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of Operatingthat System

[0361] The present invention relates to a computer system and to amethod of operating a computer system. Preferably, the invention relatesto the management of memory in a computer system, and in particular tothe management of cache memory in a computer system. In a preferredembodiment, the invention relates to outliers for spatial separation ofinfrequent code etc.

[0362] In a computer system there are various levels of cache memory. Itis of benefit to the system, in terms of improved efficiency andtherefore speed, if the caches themselves can be operated efficiently.It has been appreciated pursuant to the present invention that it wouldbe advantageous to have code which is likely to be executed frequentlylocated in the caches and in particular in the fastest cache. In theembodiment of the invention described below, Java code is compiled forfaster execution at run-time using a dynamic compiler. In order toimprove cache density of useful code (density), as one of the aims ofthe invention, it would be beneficial to have in the fastest of thecaches the compiled code that the dynamic compiler has produced.

[0363] Prior art solutions do not maximise the density of cache memory.For example, as is discussed in more detail below, it has beenappreciated that the fast caches of prior art systems are often occupiedby large amounts of infrequently accessed code reducing the density offrequently accessed code in the cache which may lead to more cachemisses. The present invention seeks to mitigate this and/or otherproblems.

[0364] According to a first aspect of the present invention, there isprovided a computer system including a compiler, the compiler beingarranged to compile dominant path fragments of code.

[0365] A dominant path represents a frequently executed path ofexecution through the code and may include a large number of individualblocks of code. By arranging for the dominant path to be compiled (andpreferably only the dominant path to be compiled), the density of usefulcode in the compiled version of the code is increased since the compiledversion includes only code which is executed frequently. Thus thedensity of useful code in the cache can be increased.

[0366] By arranging for the dominant path to be compiled, it is possibleto arrange for blocks of code including the most frequently executedpaths through the code to be more likely to be stored in the cache, andmore likely to be stored in the same (L1) cache as other blocks of thedominant path code. Thus the run-time execution of the dominant path canbe faster.

[0367] Preferably, the system further includes an execution historyrecorder for recording information about the dominant path. Preferably,an on-line record of the dominant path is made during the execution run.Preferably, therefore, the system includes means for determining thedominant path fragment during the execution of the code.

[0368] Preferably, the system further includes a compiler for compilingcode and, preferably, the compiler is arranged to compile a dominantpath fragment. Preferably, the compiler is an on-line compiler.Preferably, the dominant path fragment does not include infrequentlyexecuted code. Thus, if the dominant path fragments of code are arrangedseparately from infrequently executed fragments of code, management ofthe memory of the system can be improved.

[0369] Further discussion of preferred features in the compilation ofthe dominant path can be found in Agent's reference no. 1 of thisspecification.

[0370] Preferably, the system further includes an outlier for use wherea path of execution leaves the dominant path.

[0371] According to a second aspect of the invention, there is provideda computer system including outliers for use in the execution ofinfrequently executed code.

[0372] Where the path of execution would leave the dominant path, forexample, due to a conditional transfer to a non-dominant location of thecode or due to an exception condition being detected, control is passedto the outlier. Preferably the outlier is in the same code buffer as thefragment of dominant path from which control is transferred.

[0373] The dominant path is a ‘best guess’ of the likely path ofexecution through the code based on current behaviour. It will sometimesprove to be inapplicable for a particular execution of the code. Theoutliers are used to deal with the situation.

[0374] Preferably, the system further includes an interpreter.Preferably, the interpreter is used to execute at least some of theinfrequently executed code. Preferably, the system further includes aconverter for converting between the execution of compiled code andnon-compiled code. The converter preferably includes outliers.

[0375] Where the execution has left the dominant path due to aconditional transfer, preferably, the outlier is adapted to effecttransfer of control to the interpreter.

[0376] Where execution has left the dominant path due to an exceptionbeing encountered, preferably, the outlier is adapted to transfercontrol to an exception handler.

[0377] Preferably, the outlier is adapted to update states beforeexecution of infrequently executed code. For example, where control isbeing passed to the new non-dominant path, which is typicallyinterpreted until that new section warrants compilation, the updatingmay be required, for example, where optimisations have been used in thecompilation of the dominant path code.

[0378] Preferably, the code includes a conditional branch to theoutlier, the conditional branch including a conditional test and beingsuch that execution follows the dominant path if the conditional testfails. Processors often predict that forward branches will fail and willcarry out various checks before the branch is carried out. If thecondition of the branch occurs rarely so that usually the executionfalls through (in the dominant path), when the code for the condition iscompiled, the code is arranged so that if the condition is true, thecontrol passes to the outlier. Thus the forward branch occurs onlyrarely and thus the processor checks are only carried out on the rarelyexecuted jump to the outlier. Thus, processor time can be reducedbecause the condition is usually not true and the execution simply dropsthrough to follow the dominant path.

[0379] Preferably, the system includes means for separating frequentlyexecuted code from infrequently executed code.

[0380] That is a particular important feature of the present inventionwhich may be provided independently, thus the invention further providesa computer system including means for separating frequently executedcode and infrequently executed code.

[0381] By separating the frequently executed code from the infrequentlyexecuted code, it is made possible for memory of the system to bemanaged more efficiently. For example, it makes it possible to arrangefor less of the infrequently executed code to be pulled into the cache.That can give improved execution speed of the frequently executed codeat runtime by reducing the cache misses. The means for separating thecode may be provided by a compiler which compiles the code in aparticular way as described in more detail below. The separation may beeffected by arranging that certain types of code are stored in onememory area and other types of code are stored in a different memorylocation.

[0382] Preferably, the system further includes an outlier, and means forseparating dominant path fragments from the outlier.

[0383] Thus the system preferably includes means for storing thefrequently executed code in a first memory region and means for storinginfrequently executed code in a second memory region. Preferably, thesystem includes means for storing the dominant path fragments in a firstmemory region and means for storing outliers in a second memory region.Preferably, the first memory region and the second memory region areregions of a code buffer.

[0384] Preferably the frequently executed code and infrequently executedcode are generated in different areas of the code buffer. For example,the system may include means for storing the infrequently executed code“backwards” in the buffer. Preferably, the system includes means forstoring the dominant path fragments and the outlier at opposite ends ofthe code buffer.

[0385] By storing the code in that way, it is possible to arrange thecode so that frequently executed code is likely to be drawn into a codecache while infrequently executed code is unlikely to be pulled into thecache. Therefore, preferably the code is stored so that infrequentlyexecuted code is unlikely to be pulled into a cache.

[0386] That is a particularly important feature of the presentinvention, and can be provided independently. Thus the invention furtherprovides a computer system including a code cache, the system beingarranged so that infrequently executed code is unlikely to be stored inthe cache.

[0387] Preferably, in the compilation of the dominant path, thefrequently executed code includes the compiled dominant path fragments.Those fragments are preferably generated forwards in the code buffer.The outliers are preferably generated backwards in the code buffer, thusspatially separated from the dominant path fragments. Thus the memoryoccupied by the outliers in the code buffer can be much less than acompiled version of the original portion of infrequently executed codefragment of the uncompiled code.

[0388] The present invention further provides a computer systemincluding means for storing substantially all of (and preferably only)the dominant path compiled code together in one memory region.Preferably, the system further includes means for storing code fordealing with the non-dominant cases in spatially separate regions.

[0389] The present invention also provides a method of operating acomputer system, the method including compiling dominant path fragmentsof code. Preferably, the method includes determining the dominant pathduring the execution of the code.

[0390] Preferably, an outlier is used when a path of execution leavesthe dominant path, and preferably the outlier effects transfer ofcontrol to the interpreter and/or to an exception handler.

[0391] Preferably, the outlier updates states before execution ofinfrequently executed code.

[0392] Preferably, where the code includes a conditional branch to theoutlier, the conditional branch includes a conditional test such thatexecution follows the dominant path if the conditional test fails.

[0393] Preferably the method includes separating frequently executedcode from infrequently executed code.

[0394] Also provided by the invention is a method of operating acomputer system, including separating frequently executed code andinfrequently executed code.

[0395] Preferably, the method includes separating dominant pathfragments from outliers and preferably storing the dominant pathfragments in a first memory region and storing outliers in a secondmemory region. Preferably, the first memory region and the second memoryregion are regions of a code buffer. Preferably the method includesstoring the dominant path fragments and the outliers at opposite ends ofthe code buffer.

[0396] Preferably the method includes storing the code so thatinfrequently executed code is unlikely to be pulled into a cache.

[0397] The invention also provides a method of storing code in acomputer system including a code cache, the method being such thatinfrequently executed code is unlikely to be stored in the cache.

[0398] According to the present invention, there is further provided amethod of operating a computer system including the steps of: compilingdominant path code, and storing the compiled code in one memory region.Preferably, the method includes storing outliers in a separate memoryregion.

[0399] Also provided by the invention is a method of compiling code, thecompilation being effected so that frequently executed code is separatefrom outliers.

[0400] The invention also provides code stored in a computer system by amethod described herein and provides a compiler for compiling code inaccordance with the invention.

[0401] The invention further provides a computer-readable storage mediumhaving a programme recorded thereon for carrying out a method accordingto the invention.

[0402] The invention also provides a computer-readable storage mediumhaving a programme recorded thereon for compiling code, the compilationbeing effected so that frequently executed code is separate fromoutliers.

[0403] The invention further provides a computer programmed according toa method as aforesaid.

[0404] The invention also provides a computer programmed for compilingcode, the compilation being effected so that frequently executed code isseparate from outliers.

[0405] Accordingly, the invention provides a computer system includingmeans for storing substantially all of (and preferably only) thedominant path compiled code together in one memory region, whilst,preferably, any outlier is only stored in spatially separate regions.Such a memory layout typically maximises the amount of useful codeloaded into the cache.

[0406] The invention also provides a method of operating a computersystem including the steps of: compiling all of the dominant path code;and storing substantially all of the compiled code in one memory region,while preferably storing outliers in a separate region.

[0407] An ‘outlier’ is so called since it lies out of the normal memoryregion for predominantly executed code. In this way the infrequent, bywhich may be meant the non-dominant path, code is separated from themore frequently used dominant path code, and so does not get loaded intothe cache as long as the dominant path is executing.

[0408] Any, some or all of the features of any aspect of the inventionmay be applied to any other aspect.

[0409] The following considerations apply to any and all the inventionsand aspects of the inventions described above.

[0410] Reference will be made, where appropriate, purely by way ofexample, to the accompanying figures of the drawings (which representschematically the above improvements) in which:

[0411]FIG. 3A shows a section of code before compilation;

[0412]FIG. 3B shows a standard compilation of the code of FIG. 3A;

[0413]FIG. 3C shows compilation of code in accordance with a preferredembodiment;

[0414]FIG. 3D shows a code buffer;

[0415]FIG. 3E shows the memory arrangement in a computer system; and

[0416]FIG. 3F shows apparatus for carrying out the method of theinvention.

[0417]FIG. 3A shows a section of Java bytecode including blocks B1, B2,B3, B4 and B5 which carry out calculations 1, 2, 3, 4 and 5,respectively. B4 is code which deals with exceptions which may occur inB1, B2 or B3 (see paths 9000, 9002 and 9004 to B4). The dominant paththrough the blocks is found to be such that control (almost) alwayspasses from B1 to B3 (path 9006) at the conditional transfer of controlat the end of B1, and B3 passes control to B5 (path 9008). The paths9000, 9002 and 9004 are hardly ever taken.

[0418] An outline of the original Java source for the example of FIG. 3Ais void method () { try { calculations 1 // calculations 1 and if(condition) if (condition) { translates to block B1 calculations 2 //calculations 2 translates to block B2 } calculations 3 // translates toblock B3 and a jump to B5 } } catch () { calculations 4 // translates toblock B4 } calculations 5 // translates to block B5 }

[0419] Suppose that predominantly the condition is false, and none ofthe calculations 1, 2 or 3 encountered an exception which would becaught by the catch clause (block B4). Therefore, the useful code basedon this dynamic behaviour consists solely of blocks B1, B3 and B5.

[0420] Standard compilation techniques for this code (especially in thecase of compilation at runtime) would be to emit code for all fiveblocks, to allow for all eventualities in the subsequent execution ofthe compiled code. Thus the compiled versions of B2 and B4 potentiallywaste memory space, and as detailed below can lead to reduced cachedensity of useful code compared to preferred embodiments. If many suchmethods are compiled in this standard manner, the wider range of addressspace used to encompass the compiled code can lead to control transferscrossing address space page boundaries more frequently, with ensuinghigher frequency of page faults (if virtual memory is enabled on thecomputer system), compared to preferred embodiments.

[0421] As a program runs, the processor picks up instructions from thememory. When the instructions for the program run over the end of apage, the memory manager must be interrogated to find and check the nextpage if that next page is not in main memory. That is time consuming.Crossing a page boundary is therefore time consuming.

[0422] A standard compilation of the code is shown in FIG. 3B. BlocksB1, B2, B3, B4 and B5 are set out sequentially.

[0423]FIG. 3C shows compiled code according to a preferred embodiment.Note that the dominant path includes blocks B1, B3 and B5.

[0424] The compilation of the code has inverted the logic of thecondition test in block B1, so that the predicted fall through case isto block B3, and the unpredicted flow of control is to an outlier OL1.Note that the code for the blocks B1 and B3 are spatially contiguousdespite not being contiguous at the source and bytecode levels. This isadvantageous to modem processors with branch prediction hardware. Notealso that this contiguity by definition occupies a smaller range of thememory address space than if block B2 had been inserted in between.

[0425] Blocks B2 and B4 do not exist in the compiled versions of thecode because they were found not to be a part of the dominant path.

[0426] B5 is also spatially contiguous with block B3, and the originalunconditional control transfer present in the bytecode for jumping overthe exception handler B4 requires no corresponding host instruction.Block B3 simply drops though into block B5 in terms of control flow.Thus blocks B1, B3 and B5 are spatially contiguous, and hence occupy asmaller range of the memory address space in total than if they wereinterspersed with blocks B2 and B4. These blocks (B1, B3 and B5) havebeen packed to model the current execution characteristics of the Javamethod.

[0427] When B1 first receives control, requiring loading of a cache lineinto the processor, better cache density ensues in the immediatelyaffected cache line. Code infrequently executed (in blocks B2 and B4)does not get pulled into the cache.

[0428] Now consider several methods (or dominant paths thereof) compiledin a similar manner, and into a given code buffer. As these pass controlamongst each other, the cache perturbations will be reduced by having agreater cache density of useful code. Low cache density can lead morefrequently to cache-collisions and cache-misses. Also, with computersystems employing virtual memory, preferred embodiments can give areduction in page faults, as a consequence of the reduction in addressspace usage for frequently executable code. A page fault occurs when theprocessor tries to execute an instruction which is not in memory. When apage fault occurs, the page in which the instructions to be executed arelocated are loaded into memory from the permanent storage device that isbeing used for the virtual memory. This is a time consuming operationwhich slows down the speed of execution.

[0429]FIG. 3C shows outliers OL1 and OL2 for use if the execution leavesthe dominant path. If the conditional test passes at the end of B1,control will pass to OL1. OL1 synchronises states (that is, ensures thatregister-cached values are spilt back to their corresponding memorylocations) and then passes control to a piece of glue code to effectresumption of the unpredicted (non-dominant) path corresponding tocalculations 2 via a fall back interpreter. Until such time as thecorresponding bytecodes of the non-dominant path execute frequentlyenough to warrant dynamic compilation, these continue to be interpreted,thus saving space in the code buffers (which are limited resources) formore important paths of bytecode execution. Thus outliers of the type ofOL1 handle the case where normal control flow takes an unpredicted pathaway from the dominant path, such as needing to execute calculations 2.

[0430] An example of code of an outlier such as OL1 is as follows:

[0431] a=r_(n) //update states and restore memory locations for a, b, c

[0432] b=r_(m)

[0433] c=r_(s)

[0434] callglue (3000)// calls the glue code and tells it to interpretuncompiled code from bytecode address 3000

[0435] The interpreter will start execution at the beginning of blockB2. If the bytecode at 3000 is executed enough, it will later becompiled. The next time the glue is told to interpret from 3000, it willrecognise that there is a compiled version of B2. It will amend the‘callglue’ line of OL1 (automatically) to ‘goto . . . ’ to directcontrol to the compiled version. This is known as “patching” (seeAgent's Reference No. 12 of this specification). Thus, the next time theoutlier OL1 is called, the control will be transferred directly to B2,without the glue being used. (See also Agent's Reference No. 1 of thisspecification).

[0436] A different type of outlier OL2, deals with the situation inwhich an exception condition is recognised within the dominant path (forexample, block B1 attempts to access an array outside of its legalbounds). The dominant path passes control to an outlier (OL2) to dealwith the exception. Here, the outlier synchronises state as usual, andthen passes control to the glue code to raise the exception within thevirtual machine.

[0437] An example of code of an outlier such as OL2 is as follows:

[0438] a=r_(n) // update states, restore memory locations for a, b and c

[0439] b=r_(m)

[0440] c=r_(m)

[0441] callglue raise exception X // tell glue to transfer control to anexecution handler for dealing with an exception of type X

[0442] Further discussion of the use of glue code and the transfer ofcontrol to the interpreter can be found in the section Agent's ReferenceNo. 1 of this specification.

[0443] Only two outliers have been shown in FIG. 3C for clarity. Inpractice, separate outliers would be provided to deal with eachexception and each deviation from the dominant path which could occur.

[0444] Outliers are spatially far separated from those blocks of codecorresponding to their associated dominant paths. A given compilationproduces a set of blocks for the dominant path and another set of blocksof outliers used by the dominant path when unpredicted or exceptionalbehaviour is encountered during execution.

[0445]FIG. 3D shows the blocks of compiled code and the outliers filledinto a code buffer 9054. The dominant path blocks are filled into thebuffer in the direction 9056 and the outliers are filled in thedirection 9058. The dominant path blocks occupy one end of the codebuffer, and its outliers the other end. Each compilation of a newfragment of code produces new sets of dominant path blocks and outliersand the code buffer is laid out so that the outliers and dominant pathblocks grow towards each other. Hence it can be seen that in the normalcourse of execution, where outliers are not executed, their presence istransparent in the system with respect to the processor cache behaviour.Thus maximum cache density of useful code, and maximum address spacedensity of useful code is possible.

[0446] The code buffer is managed by the compiler manager whichindicates where the pointers are at the high memory and low memory endsof the buffer. As the compiled code is generated for a block, thecompiled version of the block will be entered in the buffer, followed bythe block of code for the outlier(s). The code for the outlier is thenmoved to the opposite end of the buffer. Thus the dominant path blocksand outlier blocks fill the buffer from separate ends. This improvescache density and reduces paging problems.

[0447] In an alternative embodiment, the blocks of dominant path codeand outliers can be filled from the same end of the buffer, but inblocks for each fragment of code. In the example above, the buffer wouldinclude (in order) B1, B3, B5, OL1, OL2, OL3 . . . The next fragment tobe compiled would also lay down the code for the dominant path blocksfollowed by that for the outliers. That arrangement is, however, lesspreferred since address space is being used up by the outliers and thereis a greater chance that code of the outliers will be pulled into thecache.

[0448] As FIG. 3E of the drawings indicates, a processor chip 9200 mayoperate at a speed of 400 MHz and be associated with an on-board, firstlevel memory cache 9202 of 16K. A second level cache 9204 of say 512Kwould be associated with the chip 9206. These are in addition to thenormal RAM 9208 of perhaps 32 MB operating at a speed considerably lessthan the 400 MHz of the first and second level cache memories. Inoperation, the processor would pull instructions in from the cache aline at a time (32 bytes). By ensuring that the most frequently usedcode, that is, the compiled dominant path code, is stored in a separatememory region from the less frequently used code, the density of themost frequently used instructions in the cache can be increased. In theprocess, less frequently used instructions will also be stored togetherbut in non-cache memory and will thus not pollute the cache.

[0449] Identification of the Frequently Executed Fragments

[0450] In order to separate the frequently executed fragments frominfrequently executed fragments of a section of code, it is necessaryfirst to identify those fragments which are frequently executed. Thiscan be accomplished by analysing an execution run of the code andidentifying the most frequently executed paths though the code (thedominant path). The dominant path can be determined from a previous runof the code. In the present embodiment of the invention, the dominantpath is determined dynamically on line during a run. Detailed discussionof the determination of the dominant path can be found under the headingAgent's Reference No. 1 of this specification. In summary, the number oftimes each block of code is executed is recorded by an execution historyrecorder. The execution history recorder notes that the block has beenexecuted and also notes from where the control has passed into the blockand also notes the successor of the block (to where the control passesfrom the block). From that information, the most popular successors ofeach block can be determined and thus the dominant path can be found.

[0451] In the case where the code is code of a Java application, thecode is first translated by an interpreter. The execution historyrecorder is run by the interpreter and records information about theinterpretation of each block. Once a block has been executed a thresholdnumber of times by the interpreter, the interpreter passes details ofthe block to a queue for compilation which is managed by a compilermanager. The threshold number of times may be 5. When the compilermanager inspects the queue and takes the block for compilation, ittraces the dominant path from the block using the information recordedby the execution history recorder regarding the interpretation of theblock and its most popular successors. The compiler then produces acompiled version of the dominant path fragment of code as described inmore detail below.

[0452] For example, for a section of non-compiled code having a generalstructure as that shown schematically in FIG. 3A, a path of executionthrough the blocks of code is usually B1, B3, B5. When the block B1 hasbeen executed 5 times, it is queued for compilation. The compiler tracesthe dominant path from B1 and finds that, although the exceptionssometimes occurred, the most popular successor of B1 was B3, and themost popular successor of B3 was B5. Thus the dominant path from B1 isB1, B3, B5. The compiler then proceeds to produce a compiled version ofthe dominant path. Compilation of the dominant path Full compiledversions of the infrequently executed pieces of code B2 and B4 are notprepared. In an alternative embodiment, compiled versions of the codecould be prepared but compilation of those sections would take time andthe compiled versions would occupy memory space and thus thisalternative embodiment is not attractive where there is limited memory,for example in a virtual machine.

[0453] The fragments B1, B3, B5 are laid out sequentially (see fragmentsB1, B3, B5 of FIG. 3C). Optimisations are made in the compilation of thecode, for example using known optimisation techniques. Exception checksare inserted at relevant positions in the compiled code, the exceptionchecks corresponding to the checks originally in the blocks B1, B3, B5of the non-compiled code. The exception checks each include a jump to arelevant piece of code called an outlier (OL2 is shown for the exceptionin B1). As indicated above, it is preferred that the outlier does notjust contain a compiled version of the code B4 for handling theexceptions. The outliers include code for updating any necessary statesand registers before transfer of control out of the compiled version ofcode.

[0454] For example, where the compiled code has been optimised, at thetime of the conditional transfer corresponding to that at the end ofblock B1, some states may not yet have been updated at the end of blockb1. Also, the compiled version of the code may hold states in differentmemory locations to those of the original code. The outlier OL1 updatesall of the states and registers to what they would have been at thetransfer of control out of the block B1 into B2. The outlier OL1 thentransfers control to a conversion device which transfers control to theinterpreter which then proceeds to interpret the code for B2. Once theexception has been handled, if appropriate, the control can be passedback, via the glue code, to the outlier, which reinstates the stateswhich had been updated and execution of the compiled code can resume atblock B3. See Agent's Reference No. 1 of this specification for afurther discussion of the role of the conversion device and the gluecode.

[0455] It will be appreciated that, in most cases, an exception will notoccur and the execution will simply pass through the blocks B1, B3, B5.

[0456] As indicated above, the compiled code is generated in the codebuffer forwards and the outliers are generated in the code bufferbackwards so that they are spatially separated in the buffer. Thus theoutliers are less likely to be pulled into a cache. Although theexecution of the exceptions (via the outliers) might be slower than forthe case where the infrequently executed code was cached with thedominant path code, that decrease in speed is more than compensated forby the increased speed of execution of the dominant path, especiallywhere the infrequently executed code is very rarely executed.

[0457] Apparatus for carrying out the method of the present invention isshown schematically in FIG. 3F. The apparatus includes an interpreter9300 for interpreting code. An execution history recorder 9302 recordsdetails of the execution of the code by the interpreter 9300. When ablock is executed the predetermined number of times, the executionhistory recorder 9302 notifies the compiler manager 9304 whichadministers a queue of blocks for compilation. The compiler 9306consults the queue and takes blocks for compilation, determines thedominant path from the records of the execution history recorder 9302and compiles the dominant path fragment and prepares any necessaryoutliers for the fragment. The compiled fragments are loaded into thecode buffer 9308. The dominant path fragments are loaded forwards in thebuffer 9308 and the outliers are loaded backwards in the buffer 9308. Atsome time, lines of the compiled code in the buffer 9308 are pulled intothe cache 9310. Compiled code is executed from the buffer 9308 or fromthe cache 9310 by the execution device 9312. If an exception isencountered which cannot be handled by the dominant path code, theoutlier 9314 updates any necessary states and transfers to the glue code9316 which transfers control to the interpreter 9300 which proceeds tointerpret code for the handling of the exception.

[0458] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike are performed on or using electrical and like signals.

[0459] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[0460] In any or all of the aforementioned, the invention may beembodied in any, some or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with of adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[0461] As used herein throughout the term ‘computer system’ may beinterchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’‘machine’ and like terms. The computer system may be or may include avirtual machine.

[0462] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[0463] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[0464] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[0465] Agent's Reference No. 4—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of OperatingThat System

[0466] The invention preferably relates to optimized execution of objectoriented languages which use the ‘interface’ abstraction, and inparticular Java. In a preferred embodiment, the invention relates toDispatch Mechanism for Interface Methods.

[0467] Java supports single inheritance of class types, with interfaces.Interfaces themselves can be multiply inherited from other interfaces.When a concrete class claims to implement a set of interfaces, it mustprovide or inherit implementations of every method directly orindirectly defined by those interfaces. (See Reference [2] listed underOther Information at the end of Agent's Reference No. 4 in thisspecification).

[0468] In object oriented programming, objects are classified in ahierarchical structure with each object associated with attributes (dataabout its features or properties) and methods (functions it mayperform). Typical such functions might be ‘ring’ in the context of amobile or other telephone, or ‘play’ in the context of audio and/orvideo reproduction equipment. As one of the features in object-orientedlanguages, such as Java, the attributes and methods of a super class ofobjects are ‘inherited’ by its subclasses.

[0469] For example, as shown in FIG. 4A, “mode of transportation” 400 isthe superclass of both ‘bike’ 402 and ‘car’ 404 classes of objects. The‘car’ sub-class could be subdivided into ‘saloon’ 406 and ‘sports’ 408and further subdivision is possible according to, for example, the makeor model of sports car etc. Certain attributes of the ‘car’ sub-class,such as the number of wheels, model, and so on, will be inherited by the‘saloon’ and ‘sports’ sub-classes. In a similar vein, methods such as‘turn on lights’ can be common to cars within the hierarchy, but in somesub-classes the methods themselves may differ to the extent that acertain function has to be performed before lights can actually beturned on. For instance, a sports car with pop-up headlights may need toraise the lights before they can be turned on. In such a case, theinheritance has to be overridden by the need to perform a functionbefore the function in question can be performed.

[0470] In another context, the user of a mobile or other telephone maywish to arrange for his handset to emit a different ring depending onwhether the call was business or social. In this context, ‘ring’ wouldbe termed an ‘interface.’ Its significance is that ‘ring’ is a functionthat a variety of objects in the hierarchy would perform (like ‘turn onlights’ in the car example above) but the actual implementation woulddiffer from object to object. Interfaces therefore cut acrosshierarchies. An interface is thus a list of functions that the objectcan perform (such as ‘ring’ or ‘play’ or ‘record’ and so on).

[0471] Single inheritance is usually implemented using dispatch tables(otherwise known as virtual function tables). A subclass inherits thedispatch table of its superclass, extending it with any new methods, andreplacing entries which have been overridden.

[0472] Multiple inheritance in languages such as C++ is normallyimplemented using multiple dispatch tables and offsets ((See Reference[1] listed under Other Information at the end of Agent's Reference No. 4in this specification).

[0473] The relevant data is stored in slots in a dispatch tableillustrated schematically in FIG. 4B. The attributes of an object in atable 410 are always located at the same distance from the start of theobject. The object includes a pointer 412 to a dispatch table of methods414 which are always at the same distance from the start for the samefunction. However, when interface methods are used, as explained above,there is no longer any certainty of knowing in which slot of thedispatch table the particular function appears. This is a problempeculiar to the multiple inheritance and particularly interfaces foundin Java language.

[0474] Up to now, the whole of the dispatch table had to be interrogatedto check that the method accessed was the proper method. It had beenrealised that, ideally, a unique identifier would be needed for theinterfaces, but in practice the table cannot be of such a size thateverything within it has a unique identifier.

[0475] Reverting to the ‘play’ function analogy, there would be onedispatch table for video recorder and one for tape recorder. Each wouldhave different interface references, so ‘play’ might be at position 2for video recorder and position 22 for tape recorder.

[0476] The logical definition of invoking an interface method is tosearch the list of methods implemented directly or indirectly by thegiven class of object. This is clearly slow. This can be improved bysearching a ‘flat’ structure which mirrors the dispatch table.

[0477] Reference [3] listed under Other Information at the end ofAgent's Reference No. 4 in this specification describes an optimizationwhere the last offset at which the interface method was found isremembered, and tried as a first guess next time the invoke interface isencountered. If the guess turns out to be wrong, a fuller search isperformed. This approach is based on the assumption that a given callsite will tend to operate on the same type of objects.

[0478] Even if the guess is right, the destination method has to bechecked to confirm that it is. In the cases where the guess is wrong, afairly slow search is needed.

[0479] Another approach would be to use an analog of the way C++multiple inheritance is supported.

[0480] The invention solves this problem by a method for reducingdispatch times during the execution of a program in object-orientedlanguage, which program has a number of interface methods, the methodincluding the steps of:

[0481] (i) creating dispatch tables;

[0482] (ii) creating an interface hash table for one or more of thedispatch tables, the interface hash table having a pointer either as anindex into a specific location in the corresponding dispatch table, orto a method stored on a computer executing the program;

[0483] (iii) when the program executes a step requiring a particularinterface method, using the interface hash table to look up the latterinterface method, either via the dispatch table, or directly.

[0484] Whereas the latter method applies in the case where the inventionis applied to the program, the invention can also be applied in the formof a “virtual machine” wherein software emulates a “virtual” computersystem in order to run a “foreign application. However, steps (ii) and(iii) above are still similarly applied.

[0485] More particularly, the invention also provides a method forreducing dispatch times wherein a virtual machine, and a set of programsexecuted by the virtual machine are stored on a computer readable medium(such as a CD); the virtual machine being operative to reduce dispatchtimes in the course of program execution by:

[0486] (i) creating dispatch tables;

[0487] (ii) creating an interface hash table for one or more of thedispatch tables, the interface hash table having a pointer either as anindex into a specific location in the corresponding dispatch table, orto a method stored on a computer executing the program;

[0488] (iii) when the program executes a step requiring a particularinterface method, using the interface hash table to look up the latterinterface method, either via the dispatch table, or directly.

[0489] These methods of reducing dispatch time can clearly bespecifically applied to Java.

[0490] In one embodiment of the invention, there is one interface hashper dispatch table. In another embodiment of the invention, there is asingle interface hash table for all the dispatch tables.

[0491] In one form of the invention, the dispatch table points to theinterface hash table. In another form of the invention, the hash tableis part of the dispatch table at the start. This later form of theinvention thereby eliminates one level of indirection.

[0492] The interface hash table can contain, for example, slot numbersof the dispatch table. Alternatively, the interface hash table cancontain function points, thereby eliminating one level of indirection.

[0493] Chief advantages of at least preferred embodiments of theinvention are that it is fast in the majority of situations. It uses nosupport routines in the common case, and does not need checks on thecaller. This makes the common case fast, and makes the generated codesmaller. It also has very little memory overhead, since a small hashtable is needed only in the case where a class implements an interface.Small and fast are important qualities for uses such as MobileTelephones where memory is limited on account of size or cost.

[0494] The method of the invention preferably includes the step ofcalling a special recovery method, in the event of a collision occurringwhen looking up the same interface method in the interface hash table.In this case, the hash table can either point to a method stored in thecomputer, or to a fallback slot in the dispatch table, which willredirect the call to an appropriate stored method, which is designed to“sort out” the class and direct the call to the appropriate location.

[0495] According to the invention in its broadest aspect, the solutionto this problem is to use an extra level of indirection through a hashtable.

[0496] For the majority of cases where there is no clash in the hashtable, invoking an interface is only slightly slower than a standardvirtual dispatch, and faster than the known techniques for invokinginterface methods. It is also expected to be more compact than the C++multiple inheritance approach, especially when dispatch table slotscontain more than one word of information.

[0497] Where there is a clash in the interface hash table, a fallbackslot in the dispatch table performs the slow but sure search.

[0498] According to other aspects of the invention, the problem of fastaccess to the required information is solved or alleviated by the use ofan interface hash table as well as a dispatch table for each of thevarious devices.

[0499] The following considerations apply to any and all of theinventions and aspects of the inventions described above.

[0500] Preferred embodiments of the invention will now be described,purely by way of example having reference to the accompanying figures ofthe drawings (which represent schematically the improvements) in which:

[0501]FIG. 4A illustrates a hierarchical structure in object-orientedprogramming;

[0502]FIG. 4B shows the arrangement of data stored in dispatch tables;

[0503]FIG. 4C shows the application of an interface hash table to adispatch table;

[0504]FIG. 4D is a hierarchical structure of a domestic equipmentsystem;

[0505]FIG. 4E shows dispatch tables used in operating devices in thedomestic system of FIG. 4D; and

[0506]FIG. 4F shows a controller program with driver devices foroperating the devices in the domestic system of FIG. 4D.

[0507] An embodiment of the invention will now be described by way ofexample only, to illustrate how a “virtual machine” can be applied inpractice. It will be appreciated that this is just an illustrativeexample, because the “virtual machine” can be applied to very manydifferent systems. Examples of these include Mobile Telephones (whichincorporate hand-held computers); Set Top Boxes for digital television;Video Equipment which is intended for use with MPEG digital systems; andintelligent Disc Drives. The invention is particularly useful where, dueto physical size (e.g., Mobile Telephones) memory is limited and moreefficient modes of executing programs, using an object-oriented languagesuch as Java, can be used. The memory onboard a Mobile Telephone may belimited, for example, to less than 500 kB, and it is in environmentswith limited memory that the invention works well. However, it can alsorun well for memories above this.

[0508] FIGS. 4D-4F schematically illustrate an example of employing avirtual machine to a domestic environment where a computer (not shown),or microcontroller (not shown), is equipped with a controller program460 for controlling the state of operating devices 461-464 used incontrolling the supply or flow of WATER (e.g. valves); HEAT (e.g.timers, valves, pumps); and LIGHTS (e.g. switches); and also controllingthe operation of a VIDEO system (e.g. switches). These operating devices461-464 are each shown connected to respective device drivers 465-468which receive appropriate command signals from the Controller Program460 during execution of a program, so that appropriate drives are givento the switches, valves, pumps, etc. to produce the required action.Input 469 enables the Controller Program to be tailored to the user'srequirements whereby, for example, at preset times, the heating systemis turned on and off (and its temperature is adjusted), the video systemis caused to play; and so on.

[0509] Referring now to FIG. 4D, there is shown various parts of adomestic system represented as objects that are classified in ahierarchical structure where DEVICE is a class having the method of onand off that is common to the sub-classes HEATING SYSTEM and ELECTRICALDEVICE and the subsequent sub-classes HOT WATER SYSTEM (or the domestichot water used for washing); CENTRAL HEATING (which is a closedcirculation system used in space heating); LIGHTS (which include thelights in each room); and VIDEO (which includes the control functionsfor playing, recording, ejecting cassettes, etc.). In addition, theHEATING SYSTEM has the method of Set Temperature, which enables control,of room temperature; the HOT WATER SYSTEM has the method Reload (whichis intended to indicate when a water softener cartridge needs to bechanged; the LIGHTS sub-class has the method Dim; and the VIDEOsub-class has the attributes Play and Reload cassette.

[0510]FIG. 4E shows the Dispatch Tables for this class and itssub-classes. In all Dispatch Tables, ON and OFF functions occupypositions 1 and 2. However, position 3 for the HOT WATER SYSTEM and theCENTRAL HEATING SYSTEM is Set Temperature, whereas the same position 3is Dim for lights and Reload for WATER and VIDEO. The method Reload willneed to distinguish between reloading a cartridge in the water softenerand reloading a cassette in the Video system, but the Reload attributeis otherwise similar. Only a few control functions have been illustratedin FIG. 4E to simplify the drawings and description, and their dispatchtables will normally contain many more slots or entries.

[0511] It is clear from FIGS. 4D-4F that an interface exists, betweenthe class/sub-classes (or control functions, i.e. methods) where thesame method is used in controlling a similar function in the operatingdevices. One interface, is the ON/OFF method; another interface isRELOAD method. Each interface method is allocated a small hash value.This interface hash value can be derived in many ways, but must notexceed the size of the hash table. Preferably, the hash values arechosen to reduce as far as possible conflicts between interface methods.One way of doing this is to derive pseudo-random hash values from eachinterface methods name, or some other fairly random attribute of theinterface method.

[0512] Preferably, choose a starting hash value which does not collidewith any related interface classes, and then allocate hash numbers foreach member method of the interface sequentially from this. Hash valuesshould be chosen so that methods of the same interface or relatedinterfaces have unique hash values and do not conflict or clash. Clearlyan object which implements many interfaces or interfaces with manymethods may not be able to avoid clashes. A larger hash table usuallyreduces the number of clashes.

[0513]FIG. 4C illustrates an embodiment of the invention wherein thedata for an object (e.g. Video) within a particular hierarchy (e.g. FIG.4D) is located in a data structure such as a table 420. The datastructure will contain a header and a plurality of object data fields.When a call is made for a relevant method stored in slots in dispatchtable 422, because of the uncertainty in knowing the exact slot in whichthat method is located, the dispatch table 422 will automaticallyre-route the call to a hash table 424 containing a condensed version ofthe method locations in the dispatch table 422. Also, because thelocations within the hash table 424 are always the same for each method,the hash table will be able to generate an index pointer 426 leading tothe correct location in the dispatch table 422 more quickly thansearching all possible locations within the dispatch table. The sameprocess is followed with other hash tables (not shown) and theirrespective dispatch tables.

[0514] In the event of a clash in the hash table, because the samelocation is needed for two interface methods, the hash table will pointto a method stored in the computer designed to ‘sort out’ the clash anddirect the caller to the appropriate location. This can also be done byfirst pointing to a slot (e.g., the first) in the dispatch table 422which then points to the “sort out” method stored in the computer.

[0515] More generally speaking, each dispatch table is created afterdefining each concrete class and when the set of methods it implementsis known. (The dispatch table will take into account methodimplementations inherited from its superclass). A fixed size hash tableis created for each class which maps the interface method hash valuedescribed above to a dispatch table index of the correspondingimplementation. Where a class implements two or more interface methodswhich have the same interface hash value, the hash table is set tocontain the dispatch table index of the fallback routine for “sortingout” a clash.

[0516] This hash table is either included at the beginning of thedispatch table, or referenced from the dispatch table.

[0517] To invoke an interface method on a given object (in a register),

[0518] a. Load the address of the interface hash table for the givenobject.

[0519] b. Get the slot number for the specified interface method usingits hash as an index into the interface hash table.

[0520] c. Load a unique identifier for the destination interface methodinto a register.

[0521] d. Given the dispatch table slot number, perform a normal virtualinvoke.

[0522] The pseudo assembler sequence for the above steps is:

[0523] Interface Hash Table Pointed to by Dispatch Table LOAD Rd, doffs[Ro] Load dispatch table address LOAD Ri, ioffs [Rd] Load interface hashaddress LOAD Ri, hash [Ri] Load slot from hash table LOAD Ru,#uniqIfaceId Load unique interface Id LOAD Ri, [Rd + Ri] Get methodaddress CALL Ri Invoke interface method

[0524] In the form of the invention where the hash table is part of thedispatch table, one level of indirection is eliminated.

[0525] Interface Hash Table Stored With (before) Dispatch Table LOAD Rd,doffs [Ro] Load dispatch table address LOAD Ri, -hash [Ri] Load slotfrom hash table LOAD Ru, #uniqIfaceId Load unique interface id LOAD Ri,[Rd + Ri] Get method address CALL Ri Invoke interface method

[0526] In the form of the invention where the interface hash tablecontains method pointers, another level of indirection is eliminated:

[0527] Method Address Stored in Interface Hash Table

[0528] (Plus Previous Optimisation) LOAD Rd, doffs [Ro] Load dispatchtable address LOAD Ri, -hash [Rd] Load address from hash table LOAD Ru,#uniqIfaceId Load unique interface id CALL Ri Invoke interface method

[0529] Where there is a clash between interface method hash entries fora particular class, the interface hash table contains the dispatch tableindex of a fallback method. The fallback method has access (inregisters) to the destination object, and a unique identifier for theinterface method. It performs the standard search for that object'simplementation of the interface method.

[0530] It will be known to those of skill in the computing art that ahash table is a means of reducing to manageable proportions a data setwhere information is sparsely populated and there is otherwise a highdegree of redundancy within the data set. A hash table thus can reducethe scale of a whole application and thereby reduce the footprint of thedevice, one of the important features of Java.

[0531] In summary, the inventions of this patent application include

[0532] 1. Using a Hash for Interface Methods

[0533] Each interface method is allocated a small hash value. Thisinterface hash value can be derived in many ways, but must not exceedthe size of the hash table used below.

[0534] It is best if the hash values are chosen to reduce conflictsbetween interface methods, therefore hash values should be chosen sothat methods of the same interface or related interfaces have uniquehash values. Clearly an object which implements many interfaces orinterfaces with many methods may not be able to avoid clashes.

[0535] Naturally, a larger hash table usually reduces the number ofclashes.

[0536] 2. Indirect Through a Hash Table When Invoking Interface Methods

[0537] When each concrete class is defined, the set of methods itimplements is known, and a dispatch table is created. The dispatch tabletakes into account methods implementations inherited from itssuperclass.

[0538] A fixed size hash table is created for each class which maps theinterface method hash value described above to a dispatch table index ofthe corresponding implementation. Where a class implements two or moreinterface methods which have the same interface hash value, the hashtable is set to contain the dispatch table index of the fallback routinedescribed below.

[0539] This hash table is either included at the beginning of thedispatch table, or referenced from the dispatch table.

[0540] To invoke an interface method on a given object (in a register),

[0541] a. Load the address of the interface hash table for the givenobject.

[0542] b. Get the slot number for the specified interface method usingits hash as an index into the interface hash table.

[0543] c. Load a unique identifier for the destination interface methodinto a register.

[0544] d. Given the dispatch table slot number, perform a normal virtualinvoke.

[0545]3. Fallback Dispatch Table Entry

[0546] Where there is a clash between interface method hash entries fora particular class, the interface hash table contains the dispatch tableindex of a fallback method. The fallback method has access (inregisters) to the destination object, and a unique identifier for theinterface method.

[0547] It performs the standard search for that object's implementationof the interface method.

[0548] It will be known to those of skill in the computing art that ahash table is a means of reducing to manageable proportions a data setwhere information is sparsely populated and there is otherwise a highdegree of redundancy within the data set. A hash table thus can reducethe scale of a whole application and thereby reduce the footprint of thedevice, one of the important features of Java. Overflows are taken intoaccount in a way which is already known in the utilisation of hashtables.

[0549] Also according to the invention, therefore, a computer systemincludes one or more dispatch tables for storing data containing methodsappropriate to objects in a class hierarchy and an interface hash tablepointing to the location in the dispatch table where a method ofinterest is located.

[0550] The invention also provides a method of operating a computersystem which uses dispatch tables containing methods appropriate toobjects in a class hierarchy, including the steps of: directing a callfor a method to the dispatch table; passing on the call to a hash tablecontaining information as to the location of methods in the dispatchtable; and redirecting the call from the hash table to that location inthe dispatch table where the method is stored.

[0551] The invention also provides a computer system including means forstoring data relating to an object, means for calling data relating to amethod appropriate to the object, a dispatch table adapted to containdata relating to at least one the method, means for passing the call onto a hash table containing information as to the location of method(s)in the dispatch table and means for redirecting the call from the hashtable to the dispatch table to access the location of the called method.

[0552] In one form of the invention, there is one interface hash perdispatch table. In another form of the invention, there is a singleinterface hash table for all the dispatch tables.

[0553] Alternatively, the invention provides both a method of improvingthe performance of interface dispatching by using a hash table and acomputer system including a hash table to improve the performance ofinterface dispatching.

[0554] In another aspect, the invention provides a method or a computersystem in which the interface reference for a particular method is foundby means of a hash table.

[0555] It will be understood that ‘interface dispatching’ is the methodby which the slot location for a particular method, e.g., the slotlocation number (2) for the ‘play’ function of a video recorder, islocated and then the relevant data is called.

[0556] Chief advantages of the invention may include faster interfacedispatching and/or a reduction in the size of footprint.

[0557] In each case, the method or computer system of the invention asspecified in the preceding paragraphs may be applied specifically toJava.

[0558] The operation of the system can be looked at in another way.Thus, in FIG. 4C of the drawings, the data for an object within aparticular hierarchy is located in a data structure such as a table 420.The data structure will contain a header and a plurality of framescontaining relevant data. When a call is made for a relevant methodstored in slots in a dispatch table 422, because of the uncertainty inknowing the exact slot in which that method is located, the dispatchtable 422 will automatically re-route the call to a hash table 424containing a condensed version of the method locations in the dispatchtable. Also, because the locations within the hash table are always thesame for each method, the hash table will be able to generate an indexpointer 426 leading to the correct location in the dispatch table morequickly than searching all possible locations within the dispatch table.

[0559] In the event of a clash in the hash table, perhaps because thesame location is needed for two interface methods, or perhaps due tobeing called by two different threads in a multi-threaded environment,the hash table will point to a method designed to ‘sort out’ the clashand direct the caller to the appropriate location or locations.

[0560] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled person that any ofthese features may be implemented using hardware or a combination ofhardware and software. Furthermore, it will be readily understood thatthe functions performed by the hardware, the computer software, and suchlike are performed on or using electrical and like signals.

[0561] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[0562] In any or all of the aforementioned, the invention may beembodied in any, some or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[0563] As used herein throughout the term “computer system” may beinterchanged for “computer”, “system”, “requipment”, “apparatus”,“machine” and like terms. The computer system may be or may include avirtual machine.

[0564] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[0565] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[0566] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be proved independently or inany appropriate combination.

[0567] Other Information

[0568] Related Patents:

[0569] U.S. Pat. No. 5,367,685

REFERENCES

[0570] [1] “The Annotated C++ Reference Manual” by M. Ellis and B.Stroustrup, Addison Wesley (ISBN 0-201-51459-1) pages 217-237

[0571] [2] “The Java Programming Language” by K. Arnold and J. Gosling,Addison Wesley (ISBN 0-201-63455-4) chapter 4

[0572] [3] “The Java Virtual Machine Specification” by T. Lindholm andF. Yellin, Addison Wesley (ISBN 0-201-63452-X) pages 258-260,403-405

[0573] [4] “Modern Compiler Implementation in Java”; A. W. Appel;Chapter 14; published Mar. 12, 1998

[0574] Java is a trademark of Sun Microsystems.

[0575] Agent's Reference No. 5—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of OperatingThat System

[0576] The present invention relates to a computer system and method ofoperating the same, to so-called return barriers for garbage collectionin a computer, to a computer-readable storage medium, computer system,Virtual Machine or similar apparatus incorporating the same, and to anoperating method. In a preferred embodiment, the invention relates toreturn barriers for garbage collection.

[0577] The invention has general applicability to Run-time Environments.More particularly, it is applicable to automatic dynamic memorymanagement.

[0578] The present invention relates in one aspect to the use ofso-called return barriers to minimize blocking while a thread's stack isbeing inspected by a concurrent garbage collector.

[0579] In certain computer systems, as shown schematically in FIG. 5A,data is stored in (activation) frames 29000 in an (activation) stack29002 with the most recent activity being regarded as stored in thelowermost frame in the stack (although it could equally be in theuppermost). Garbage collection involves tracing the connectivity of allcells. Any that are not traced in this way are therefore invisible andcannot contain any information of relevance. Those cells can thus bereleased for use (as additional memory) in the system. The garbagecollector traces every frame in every thread's stack.

[0580] A typical activation stack is shown in more detail in FIG. 5B.For convenience, the stack (29102 as shown in this figure) may beregarded as a memory store in which items are successively added fromtop to bottom so that the ‘youngest’ items are nearest the bottom of thestack. Each stack consists of a number of frames (29104 as shown in thisfigure), each frame containing data and one or more references or framepointers to other frames or stacks. Typically, each frame will contain apointer 29106 to the previous, that is older, frame and a return pointer29108 into the calling procedure's executable code.

[0581] In the tracing process, all of the pointers or references (tomemory objects stored in the memory heap) in each frame of the stackneed to be looked at. For that to happen, it has been necessary up tonow for the thread to be paused while tracing is carried out through thewhole of that thread's stack. That in turn requires the garbagecollection process to be halted while it waits for the thread to givepermission for the garbage collector to interrogate its frames.

[0582] In other words, any references contained in a thread of control'sactivation stack need to be treated as part of a tracing concurrentgarbage collector's root set, and need to be examined during the garbagecollection process. It is vitally important that the thread beinginspected does not alter any information that the garbage collector(“GC”) thread could be examining. One way of achieving this is tosuspend execution of the thread to be inspected, allow the GC to inspectthe entire contents of the stack, and then to resume execution of theinspected thread.

[0583] The main problem with the above technique which has beenidentified pursuant to the present invention is that the amount of timea thread will be suspended is determined by the size of the thread'sstack, and suspending a thread for too long will lead to noticeablepauses. The technique described by this patent allows a thread tocontinue execution, provided preferably that it is not trying to use aportion of the stack that the GC thread is interested in.

[0584] According to one aspect there is provided a method of operating acomputer system including at least one, preferably a plurality or evenmultiplicity of, activation stack(s) arranged to be managed by its(their) respective thread of control, the method including the steps of:

[0585] executing the thread using its activation stack; and

[0586] permitting a further thread to access, preferably simultaneously,the same activation stack. By this feature the degree of concurrency inthe system can be enhanced.

[0587] In order to prevent possible problems of contention, execution ofthe thread may be paused for only part of the time that the furtherthread is accessing the activation stack of the thread. Hence, there isprovided a mechanism whereby any operation which for example wants toexamine the contents of another thread's stack can do so without causingthat thread to be halted unduly.

[0588] For the same reason, the thread and the further thread may beprevented from accessing the same activation frame at the same time.

[0589] Similarly, execution by the thread of its current activationframe may be paused for the time it takes the further thread to accessthe same activation frame.

[0590] A barrier may be provided to selectively prevent return of thethread from its current activation frame into the frame currently beingaccessed by the further thread. In typical practical situations onethread would be expected to execute a given activation frame more slowlythan the time that the other thread (in this case referred to as thefurther thread) would take to access it. Accordingly, it is not expectedthat the return barrier will come into operation particularlyfrequently. However, it is most important in ensuring that no conflictoccurs.

[0591] The preventive effect of the barrier may be selective uponwhether the further thread is currently accessing the parent frame ofthe thread's current activation frame. Preferably a barrier is providedto prevent return of the thread from its current activation frame intothe parent activation frame of the current activation frame of thethread if the further thread is currently accessing the parentactivation frame.

[0592] The barrier for the current activation frame may be providedbefore the further thread changes the frame it is currently accessing.By this feature a form of ‘air lock’ is created.

[0593] A second further thread or even a larger number of furtherthreads may be permitted to access, preferably simultaneously, the sameactivation stack. In one preferred embodiment the further thread is thethread of a, preferably concurrent, garbage collector, the secondfurther thread is the thread of a debugger, and the thread is any otherarbitrary thread within the computer system.

[0594] For the same activation frame different barriers may be providedfor the further and the second further threads. This can allow differentfunctions to be performed.

[0595] Preferably, the barriers are arranged to allow a group of thebarriers to be associated with a single activation frame. For eachdifferent barrier a descriptor block may be provided, the descriptorblocks preferably being linked to form a linked list. This is aconvenient way of coping with multiple barriers.

[0596] One particularly important use of the present invention is ingarbage collection. Hence, the further thread may be the thread of a,preferably concurrent, garbage collector.

[0597] Preferably, in a single cycle the garbage collector makes aninitial and at least one further scan of the frames of the stack.

[0598] Preferably, in the at least one further scan only frames whichhave mutated are scanned. Scanning can be time-consuming and accordinglythis feature can reduce the time taken for garbage collection.

[0599] A record may be kept of the extent to which, in the at least onefurther scan, the frames need to be re-scanned. This record can be usedto determine the point at which subsequent scans can be started. Morespecifically, the record may be of which frames could have mutated orbeen created between two given scans. Re-scanning may be from theyoungest frame which has an intact return barrier to the currentactivation frame.

[0600] In a closely related aspect the present invention provides acomputer system including:

[0601] at least one, preferably a plurality or even multiplicity of,activation stack(s) arranged to be managed by its (their) respectivethread of control;

[0602] means (preferably a run time engine) for executing the threadusing its activation stack; and

[0603] means for permitting a further thread to access, preferablysimultaneously, the same activation stack.

[0604] Preferably, the computer system further includes means forpausing (or, for example the run time engine, is further adapted topause) execution of the thread for only part of the time that it takesthe further thread to access the activation stack of the thread.

[0605] The computer system may further include means for preventing (ormay further be adapted to prevent) the thread and the further threadfrom accessing the same activation frame at the same time.

[0606] The computer system may further include means for pausing (or mayfurther be adapted to pause) execution by the thread of its currentactivation frame for the time it takes the further thread to access thesame activation frame.

[0607] The computer system may further include means for providing (ormay further be adapted to provide) a barrier to selectively preventreturn of the thread from its current activation frame into the framecurrently being accessed by the further thread.

[0608] The computer system may further include means for providing (ormay further be adapted to provide) a barrier to prevent return of thethread from its current activation frame into the parent activationframe of the current activation frame of the thread if the furtherthread is currently accessing the parent activation frame.

[0609] The computer system may further include means for providing (ormay further be adapted to provide) the barrier for the currentactivation frame before the further thread changes the frame it iscurrently accessing.

[0610] The computer system may further include means for permitting (ormay further be adapted to permit) a second further thread to access thesame activation stack.

[0611] The computer system may be adapted to provide for the sameactivation frame different barriers for the further and the secondfurther threads.

[0612] Preferably, the barriers are arranged to allow a group of thebarriers to be associated with a single activation frame. The computersystem may be adapted to provide for each the different barrier adescriptor block, the descriptor blocks being linked to form a linkedlist.

[0613] The further thread may be the thread of a garbage collector.

[0614] The garbage collector may be adapted to make, in a single cycle,an initial and at least one further scan of the frames of the stack.

[0615] The computer system may be adapted so that in the at least onefurther scan only frames which have mutated are scanned.

[0616] The computer system may further include means for keeping (or mayfurther be adapted to keep) a record of the extent to which, in the atleast one further scan, the frames need to be re-scanned.

[0617] The invention has especial utility in the context of garbagecollection.

[0618] In broad terms, it is proposed to solve the various problemsmentioned earlier in connection with garbage collection by suspendingthe non-GC thread's execution only for as long as it takes to examinethe youngest activation frame, and editing the frame's return address torefer to some special code. Then the thread is allowed to continueexecution while successive caller's activation frames are examined. Onceexamination of a particular frame is completed, before moving onto thenext, the frame's return address is edited to refer to the same specialcode mentioned earlier.

[0619] Garbage collection is a relatively rapid event by comparison withthe speed of execution of a typical procedure call. Thus, it isrelatively rare (though, of course, certainly possible) for a returnfrom a procedure call to occur before the garbage collection iscomplete. In such a rare event the special code is activated; itintercepts attempts to return from an activation frame back to thecaller's frame. If the caller's frame is currently being examined by theGC thread, the non-GC thread is compelled to wait until the GC threadhas moved onto another frame.

[0620] In this context, the invention further provides a method ofimproving the concurrent garbage collection of reference data containedwithin a thread stack in a computer system, wherein the thread is onlypaused for the purpose of garbage collection for the time it takes toexamine the current activation frame, rather than the entire stack.

[0621] Preferably measures are taken to prevent the return of anoutstanding procedure call into an activation frame whose contents arecurrently being inspected by the garbage collector until such time asthe garbage collector has completed the inspection of that frame.

[0622] Analogous apparatus may also be provided within the scope of theinvention, including a garbage collector and means for pausing thethread for the purpose of garbage collection only for the time it takesto examine the current activation frame, rather than the entire stack.

[0623] In a closely related aspect, there is provided a computer orcomputer system including a garbage collector and means for pausing thethread for the purpose of garbage collection only for the time it takesto examine the current activation frame, rather than the entire stack.

[0624] In a further closely related aspect, there is provided acomputer-readable storage medium having a program recorded thereon, theprogram providing a method of improving the concurrent garbagecollection of reference data contained within a thread stack in acomputer system, wherein the thread is only paused for the purpose ofgarbage collection for the time it takes to examine the currentactivation frame, rather than the entire stack.

[0625] In a further closely related aspect, there is provided a computerwhen programmed so as to provide a method of improving the concurrentgarbage collection of reference data contained within a thread stack ina computer system, wherein the thread is only paused for the purpose ofgarbage collection for the time it takes to examine the currentactivation frame, rather than the entire stack.

[0626] In a closely related aspect, the invention provides a method ofimproving concurrent garbage collection in a thread stack of a computersystem, including the steps of: enabling the garbage collection threadto access the thread of interest in the stack; suspending the executionof the thread of interest only for as long as necessary for the mostactive activation frame to be examined; editing the return address ofthe frame to a return barrier code; allowing the thread of interest tocontinue execution while successive activation frames are examined; andediting the return address of each frame to the same return barrier codebefore moving on to the next frame.

[0627] The barrier code may be used to prevent the return of anoutstanding procedure call into an activation frame whose contents arecurrently being inspected by the garbage collector until such time asthe garbage collector has completed the inspection of that frame. Theinvention thereby achieves the objective of reducing the time that thethread of interest is suspended. It can also maximize the degree ofconcurrency in a garbage collection system and improves the illusion ofconcurrency.

[0628] In a further closely related aspect, there is provided acomputer-readable storage medium having a program recorded thereon, theprogram providing a method of improving concurrent garbage collection ina thread stack of a computer system, including the steps of: enablingthe garbage collection thread to access the thread of interest in thestack; suspending the execution of the thread of interest only for aslong as necessary for the most active activation frame to be examined;editing the return address of the frame to the return barrier code;allowing the thread of interest to continue execution while successiveactivation frames are examined; and editing the return address of eachframe to the same return barrier code before moving on to the nextframe.

[0629] The present invention extends to a computer when programmedaccording to the above method.

[0630] The present invention also extends to a computer system includingat least one, preferably a plurality or even multiplicity of, activationstack(s) arranged to be managed by its (their) respective thread ofcontrol, when programmed so as to:

[0631] execute the thread using its activation stack; and

[0632] permit a further thread to access the same activation stack.

[0633] The present invention also extends to a computer-readable storagemedium having a program recorded thereon, the program providing theabove method.

[0634] In a closely related aspect, there is provided acomputer-readable storage medium having a program recorded thereon, theprogram providing a method of operating a computer system, the computersystem including at least one activation stack arranged to be managed byits respective thread of control, the method including the steps of:

[0635] executing the thread using its activation stack; and

[0636] permitting a further thread to access the same activation stack.

[0637] The invention extends to a Virtual Machine including the abovecomputer or computer system.

[0638] The invention extends to a Virtual Machine when operated by theabove method.

[0639] The invention extends to a Virtual Machine when operated by meansof the above computer-readable storage medium.

[0640] Any, some or all of the different features of the various aspectsof the present invention may be applied to the other aspects.

[0641] Preferred features of the present invention will now bedescribed, purely by way of example, with reference to the accompanyingdrawings, in which:

[0642]FIG. 5A is a schematic illustration of data storage in a stack;

[0643]FIG. 5B shows an activation stack;

[0644]FIG. 5C illustrates how checks are made on references in a frame;

[0645]FIG. 5D shows the arrangement of data in a procedure call frame;

[0646]FIG. 5E shows the execution of a procedure; and

[0647]FIG. 5F shows the arrangement of the contents of a barrierdescriptor block.

[0648] The invention will first be described in general terms and willthen be followed by a more comprehensive description of a particularmanner in which the invention may be put into effect.

[0649] With reference to FIG. 5A, in the present invention, it has beenrecognised that the thread need only be paused by the run time enginefor as long as it takes to examine the most recent (that is the youngestor most active) activation frame 29004 in stack 29002, not for the timerequired to examine all activation frames. The frame is checked forreferences or pointers and the return address is edited by substitutingfor the previous return address a special code (the return barrier code)which is sent into the program 29006 itself, which is operating on datain what is known as the heap 29008. The special code links the old andthe new return addresses to that frame. If the garbage collector isoperating on that frame at the time that the data is to be returned, thereturn barrier code prevents corruption of the data in that frame bypausing return until such time as the garbage collector has moved on toanother frame.

[0650] The success of the invention relies on the realization that onlythe youngest frame will have been changed or mutated as a result of workbeing done on it. When the thread is paused, for example for garbagecollection, checks are made on all the references in the youngest frame.With reference to FIG. 5C, the frame pointer and return address arecopied into a separate store (29204) and the return address is editedinto the special code. The thread can then continue in the youngestframe. When the youngest frame wants to return to the previous frame, itmay not be able to do so because the GC may be active at that location.Under those circumstances, the return barrier diverts the request to thespecial code, thereby preventing the thread from returning to theprevious frame until the GC has finished. The GC lays down the returnbarrier and the thread removes it when safe to do so.

[0651] Most returns are not hindered because the GC will have moved onfrom the youngest frame and will be investigating frames some distanceremoved. There could be several return barriers in a stack, depending onthe number of threads trying to access the stack at any one time. Itfollows that subsequent threads do not need to go back through as muchof the stack as previously.

[0652] A more detailed description is now provided, at first of ageneric return barrier and then later of the implementation of thereturn barrier in the context of garbage collection.

[0653] A thread's stack is composed of a sequence of frames. A framecontains all the information related to one particular outstandingprocedure call. All frames contain the following:

[0654] (a) A return address, which is the address of some executablecode which indicates where program execution should resume from once theprocedure call associated with the frame has returned (a return addressis a specific kind of instruction pointer); and

[0655] (b) A parent frame pointer, which is a particular type of pointerto memory which points to an address which indicates the frame of thecalling procedure (a parent frame pointer is a specific kind of framepointer).

[0656] Reference is directed to FIG. 5D for an indication of thearrangement of data in a procedure call frame 29302 containing procedureparameters 1, 2 . . . through n, (29304), a return address 29306, aparent frame pointer 29308 and a set of local data 29310.

[0657] Hence, there exists the notion of the current or youngest frame,which describes the outstanding procedure call the thread is currentlyexecuting using procedure call stack 29402, as illustrated schematicallyin FIG. 5E. The frame pointer 29404 of this youngest frame 29408 willtypically be held in a particular machine register (known as the frameregister). Successive parent frame pointers such as 29406 refer toincreasingly older frames 29410, 29412.

[0658] The procedure undertaken in the above illustration may berepresented as follows: procedure C (pc1) begin . . . end; procedure B(pb1, pb2) begin C(z); . . . /* pt 2 */ end; procedure A (pa1, pa2, pa3)begin B(x, y); . . . /* pt 1 */ end;

[0659] A generic return barrier mechanism is now described (that is amechanism which is not restricted to use in the context of a garbagecollector), by which it can be arranged to have a series of arbitraryfunctions executed whenever a procedure executing in the context of aparticular barriered frame attempts to return. This mechanism incurs nooverhead if no barrier is present.

[0660] The return barrier mechanism may have a number of differentclients, possibly (or even probably) simultaneously (so that more thanone return barrier per frame may be required). More detailed descriptionis provided herein of one particular client, the garbage collector,where the function to be executed is effectively a halting function.Another possible client is a debugging interface, where rather thanbeing a halting function the function concerns the provision ofinformation to the debugger. The important feature in this context isthe ability to interrupt the return mechanism.

[0661] Laying down a return barrier is the mechanism whereby we arrangefor an arbitrary function p in frame f, as mentioned previouslytypically this function depends upon the ultimate client—for examplethere is a specific function (referred to later as code B) which can beused for garbage collection. The general arrangement of the contents ofa barrier descriptor block 29502 is shown schematically in FIG. 5F; onebarrier descriptor block is provided in memory per return barrier. Inthe following pseudo-code which describes the laying down of the returnbarrier the special code referred to earlier and described in moredetail later is referred to as C.

[0662] allocate from memory a barrier descriptor block d.

[0663] let d's original return address be f's return address.

[0664] if f's return address is C,

[0665] ;there is already at least one ‘C’ barrier laid in this

[0666] ;frame, so f's frame pointer is really a barrier

[0667] ;descriptor block.

[0668] let d's barrier link be f's parent frame.

[0669] let d's original parent frame be the original parent frame in thebarrier descriptor block pointed to by f's parent frame.

[0670] ;the above two steps serve to establish another link in the

[0671] ;linked list of the barrier descriptor blocks, with the

[0672] ;barrier links pointing to successive boxes and d's barrier

[0673] ;link being at the front of the linked list.

[0674] else

[0675] ;there is no return barrier in this frame, so f's frame

[0676] ;pointer really is a frame pointer.

[0677] let d's barrier link be NULL.

[0678] ;so that the barrier link is a pointer to nowhere else in

[0679] ;the linked list.

[0680] let d's original parent frame be f's parent frame.

[0681] ;in other words save f's parent frame into d.

[0682] let f's return address point to Code C.

[0683] ;this establishes the barrier.

[0684] endif

[0685] let d's barrier function be p.

[0686] let f's parent frame be d.

[0687] The idea as expressed above is that multiple barriers can be laiddown in one particular frame, expressed as a chain of descriptor blockslinked via the barrier link fields. Each block has a copy of theoriginal frame pointer, but each could have a different barrierfunction, so that each can have a different client.

[0688] It will understood that, for example in the context of garbagecollection (which executes at a relatively rapid rate), an attemptedreturn from the youngest activation frame is not particularly likely tooccur. However, when the procedure executing in the context of abarriered frame does attempt to return, the code at C will be executed.It is responsible for executing each of the barrier functions in turn,and then completing the return as if no barrier had been present. Code C(the “special code”) is described by the following section ofpseudo-code; it is to be noted that this code is typically generic toall return barriers. The section includes reference to a linked list,which is a series of linked pointers.

[0689] ;The procedure return mechanism means that the frame register

[0690] ;contains a pointer to the first barrier descriptor block in the

[0691] ;chain (linked list).

[0692] let d be the descriptor block in the frame register.

[0693] invoke d's barrier function p on d's original parent frame.

[0694] ;(each barrier descriptor block will have the same parent frame;reference)

[0695] let r be d's original return address.

[0696] ;note that the original return address may point to code C.

[0697] if d's barrier link is NULL,

[0698] ;end of chain (linked list) reached—continue normal

[0699] ;execution.

[0700] let frame register be d's original parent frame.

[0701] else

[0702] ;another barrier in the chain.

[0703] let frame register be d's barrier link.

[0704] endif

[0705] de-allocate barrier descriptor block d.

[0706] continue execution from address r.

[0707] The above describes the preferred embodiment of a generic returnbarrier mechanism.

[0708] In the specific context of garbage collection, the garbagecollector utilises return barriers to ensure that no attempt is made byanother thread to continue execution in a frame which is currently beingscrutinised by the GC, while allowing execution in frames that are notbeing examined.

[0709] The implementation of the return barrier is now described. Thegarbage collector will investigate the contents of a thread's stack inthe following way. Let gcf be a system-wide global variable whichcontains a reference to the activation frame currently being inspectedby the GC thread. Only the GC can alter this, although it can be read byany thread. Hence gcf expresses the concept of the garbage collectorfocus, the frame which the garbage collector is currently examining.

[0710] The GC thread examines a thread t's stack as described in thefollowing section of pseudo-code:

[0711] suspend t.

[0712] ;in other words execution of the entire thread is suspended

[0713] let gcf be t's youngest (top-most) frame.

[0714] inspect the contents of frame gcf.

[0715] lay down a return barrier B in gcf.

[0716] ;(by altering gcf s return address to point to the barrier

[0717] ;intercept code)

[0718] ;note that it is important that the return barrier is laid down

[0719] ;in the parent frame before the younger frame is allowed into

[0720] ;the parent frame, otherwise the present youngest frame could

[0721] ;behave unexpectedly

[0722] let gcf be gcf's parent (caller's) frame.

[0723] allow t to resume execution.

[0724] while gcf is not NULL do

[0725] inspect the contents of frame gcf

[0726] lay down a return barrier B in gcf

[0727] let gcf be gcf's parent frame.

[0728] endwhile

[0729] In this way the garbage collector can proceed through all of theframes of thread t's stack, from the youngest to the oldest.

[0730] The barrier intercept code B is invoked in the relativelyunlikely event that a procedure attempts to return from a frame into theparent (caller's) frame (pf), and it will be supplied with a pointer tothe frame it is trying to return into. It ensures that no attempt ismade to return into a frame that the GC is currently inspecting (thatis, it traps attempt to return to the parent frame):

[0731] while pf==gcf (this is a relatively unlikely event) do

[0732] wait a short time

[0733] ;that is, code B keeps on waiting until the non-GC

[0734] ;thread can return safely

[0735] endwhile

[0736] Once the GC thread's focus (point of interest) has moved on, thenon-GC thread can allow its return to caller to complete safely.

[0737] It is possible that return barriers established by earlier threadinspections could still be intact on subsequent inspections. In thatcase the GC thread does not try to establish a barrier if one is alreadypresent. While the GC is not running, gcf is set to point to animpossible value. Hence any return barriers are ignored. Hence when theGC is not running the return barrier mechanism is self-cleaning;although the return barriers remain in place the only overhead involvedin running them is execution of pf==gcf, which is a trivial overhead.

[0738] The particular GC being employed may require that a particularthread's stack be examined multiple times in a single GC cycle, until nonew activity is detected in the relevant thread (incidentally, thisprocess is guaranteed to terminate at some point, since the heap is of afinite size). With the technique thus far described, each frame in eachstack would need to be examined the appropriate number of times.However, in one preferred variant, now described, the barrier function Bis enhanced to keep a record (in fact a single record per activationstack) of the most recent frame it had been invoked from. This recordeffectively represents the “high water mark” of activity on the stack.It recedes (moves towards older frames) as successive returns are made;however, calls to fresh activation frames do not alter the value of therecord, since there will be no return barriers in such frames. When theGC examines a stack, it can assume that all frames older than the mostrecent frame the barrier function had been invoked from could not havechanged, and so are not re-scanned. Hence the first scan involves the GCexamining each and every frame in each and every stack. Subsequentre-scanning occurs from the youngest frame on the stack up to andincluding the youngest frame that still has a return barrier originallylaid down in previous scans. Frames older than this cannot have changedin the interval between scans.

[0739] Details of the enhancement required to support minimalre-scanning are now provided. In addition to the variables describedearlier, each thread has a variable tl-hwf, which at all times holds theyoungest frame in the thread which has an intact GC return barrier.

[0740] The following enhanced technique examines a thread t's stack:

[0741] suspend t

[0742] let oldhwf be t's tl-hwf value.

[0743] if this is the first time this thread is being scanned in this GCcycle, then let scanlimit be NULL else let scanlimit be oldhwf endif letgcf be t's youngest frame. inspect the contents of frame gcf if gcf isnot the same as oldhwf then lay down a return barrier B in gcf. letlaybarriers be TRUE. else let laybarriers be FALSE. endif let t's tl-hwfvalue be gcf if scanlimit is the same as gcf, then let finished be TRUE.else let finished be FALSE. let gcf be gcfs parent frame. endif allow tto resume execution. while finished is not TRUE do inspect the contentsof frame gcf if laybarriers is TRUE, then if gcf is not the same asoldhwf, then lay down a return barrier B in gcf else let laybarriers beFALSE. endif endif if scanlimit is the same as gcf, then let finished beTRUE else let gcf be gcfs parent frame. endif endwhile

[0744] Code B is enhanced so that it maintains the executing thread'scopy of tl-hwf:

[0745] while pf=gcf do

[0746] wait a short time

[0747] endwhile

[0748] let this thread's tl-hwf be pf.

[0749] this permits updating of the high water mark where “this thread”refers to the thread executing Code B at the time (several threads maybe doing this simultaneously).

[0750] The technique described above can allow a Concurrent GCimplementation to minimise the amount of time it spends interacting witheach thread, which in turn allows it to re-scan all threads morequickly, thus allowing entire GC cycles to complete in less time than itwould take if all thread stacks had to be re-scanned in their entirety.

[0751] In summary, two (amongst other) fundamental aspects have beendescribed. Firstly,

[0752] a generic return barrier mechanism is provided, allowingarbitrary actions to be undertaken when a procedure returns. Themechanism does not cause excessive overhead to occur when the returnbarrier is not being used. Only the current frame has to pause whennecessary—thereafter the procedure is self regulating in the sense thatit can proceed at its own pace; little synchronization or handshaking isrequired.

[0753] Secondly, specifically in the context of concurrent garbagecollection, one can use a return barrier to ensure that no attempt ismade to re-enter a frame currently under scrutiny.

[0754] Concomitant with this is the ability to allow the GC to inspect athread's stack while that thread is still running. A further featurewhich has been described is that, should the thread's stack be rescannedit is possible to determine which portion of the thread has to be lookedat; this is achieved through a high water mark mechanism.

[0755] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike are performed on or using electrical and like signals.

[0756] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be

[0757] implemented by a suitable processor or control means, either insoftware or in hardware or in a combination of the two.

[0758] In any or all of the aforementioned, the invention may beembodied in any, some or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[0759] As used herein throughout the term ‘computer system’ may beinterchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’‘machine,’ and like terms. The computer system may be or may include avirtual machine.

[0760] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[0761] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[0762] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[0763] Agent's Reference No. 6—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of OperatingThat System

[0764] The invention relates to a method of and apparatus for examiningmemory in a computer system to allow a section of compiled code to bedeleted, and to a method of and apparatus for deleting compiled code ina computer system. The invention finds particular (but not exclusive)application in the environment of a unified stack virtual machine inwhich stack walking allows compiled code to be deleted. In a preferredembodiment, the invention relates to stack walking to allow compiledcode deletion in the multi-threaded environment of a unified stackvirtual machine

[0765] The invention applies preferably to virtual machines wherecompiled portions of the code being run in the virtual machine appearand need to be removed at various times in the execution of the virtualmachine; for example, in a dynamically compiling virtual machine.

[0766] When executing code using a virtual machine, we have found thatit is advantageous to produce a compiled version of some or all of theemulated code (see Agent's Reference No. 1 in this specification). Webelieve that it will sometimes be desirable or necessary to subsequentlyremove some or all of these compiled versions. Also we believe that itwould be advantageous to use a single stack to support the stackrequirements of both the emulated machine and also the needs of thevirtual machine code itself, and to use a native call instruction toperform the equivalent of an emulated call (invoke) and use of a nativereturn instruction to perform the equivalent of an emulated return inthe code being run on the virtual machine.

[0767] Where a computer system has finished using memory which it hastaken to perform a particular function we have found that it is in theinterests of speed and efficiency that the used memory is returned assoon as possible for further use.

[0768] Currently known techniques for virtual machines would requirethat one or more of the optimising techniques listed in the backgroundinformation section be not taken advantage of, or require explicitchecks to be used which impair the efficiency of the system.

[0769] In particular, the deletion of compiled code from a system cangive rise to problems. There may be a link from a section of compiledcode which is not being deleted into a section of deleted code.Particular problems can arise because of the proposed use of a nativecall instruction (or equivalent) to emulate a call or invoke in thevirtual machine; this would typically leave the address where executionis to continue once the called method is complete (the “return address”)on that stack for that thread, at or near the stack point when the callor invoke is performed. If the native call instruction is part of acompiled version of a section of code, then the return address willpoint into the compiled version. This causes no problems until the pointof deletion of the compiled version. The return address cannot be leftpointing to where the compiled version used to be. If, during execution,a thread tried to return to the address where the compiled code used tobe, an error would occur and execution by that thread would usuallyterminate. In such an arrangement, it would be necessary to perform acheck at each place where a return is about to be performed to ensurethat it is safe to perform a return operation.

[0770] The present invention seeks to mitigate this and/or otherproblems.

[0771] The solution to these problems in a preferred embodiment of theinvention is, at the point of deletion of the compiled code, to performan examination of the virtual machine, looking for cases where a returnaddress exists in the stacks that points to a position within the pieceof compiled code to be deleted, and to re-arrange the thread's stackcontents to allow seamless continuation of execution of that threadwithout the compiled version of the code which is about to be deleted.The mechanism is preferably arranged such that the cost of the operationis borne at the time of deletion, with little or no extra cost at normalcall/return time, since the relative frequency of the two situations issuch that there are many more call/return operations than code deletionoperations.

[0772] Accordingly, the invention in one aspect provides a method ofexamining memory in a computer system to allow a section of compiledcode to be deleted, the method including:

[0773] examining a frame of a stack in the computer system;

[0774] identifying whether the frame contains a return address which isin the range of addresses of the section of compiled code; and

[0775] altering the contents of the frame when such a return address isidentified.

[0776] By carrying out the above method, the problems associated withleaving a return address pointing into a section of compiled code to bedeleted can be overcome.

[0777] In a closely related aspect of the present invention, there isprovided a method of deleting compiled code in a computer system,including:

[0778] selecting a section of compiled code to be deleted;

[0779] examining a frame of a stack in the computer system;

[0780] identifying whether the frame contains a return address which isin the range of addresses of the section of compiled code;

[0781] altering the contents of the frame when such a return address isidentified; and

[0782] deleting the section of compiled code.

[0783] Preferably any such return address is changed to the address of apiece of continuation code. The continuation code enables execution tocontinue after the return without the code to be deleted. Preferably,the continuation code is arranged to transfer control to an interpreter.The continuation code may be arranged so that subsequent instructionsare interpreted, for example, until a section of emulated instructionsis encountered for which there is a compiled version, or alternatively,to jump to a compiled version of the code to be deleted, if such aversion exists. The use of a fallback interpreter for the execution ofinstructions subsequent to the return allows execution of theinstructions of the deleted compiled code without the overhead ofcreating a new compiled version of the instructions.

[0784] If the frame contains such a return address, preferably, valuesin the frame are changed. Preferably, values in the frame are arrangedto enable execution to continue without the code to be deleted. Forexample, temporary register information which is stored in the frame maybe changed to take into account optimisations which were made when thecode to be deleted was compiled. Such changes may be required, forexample, where control is to be transferred to an interpreter forsubsequent execution (see Agent's reference no. 1 of thisspecification).

[0785] Preferably the alteration of the frame is carried out at the timeof deletion. Thus, none of the links and return addresses of the framewill point into the compiled code after deletion, and time consumingchecks during execution can be avoided.

[0786] Preferably, a plurality of frames in the stack are examined. Forexample, all frames in the stack may be examined, or else, each framewhich may contain a return address pointing into the section of compiledcode is examined.

[0787] In a preferred embodiment of the invention, the computer systemoperates a multi-threaded environment. Each thread has its own stack.

[0788] Preferably, the stacks of a plurality of threads in the computersystem are examined. For example, the stacks of all threads in thecomputer system may be examined, or else, the stack of every thread towhich the code to be deleted may have had access is examined.

[0789] In this way it can be ensured that no return addresses point intothe section of compiled code to be deleted.

[0790] It may be known that some threads cannot have had access to thesection of compiled code to be deleted. Execution time can be saved bynot examining the stacks of such threads.

[0791] For the thread of the stack being examined, however, it willoften be necessary to stop the thread while the examination is carriedout. Alternatively, a return barrier may be inserted to restrict thethread to certain sections of the code (see Agent's Reference No. 5 inthis specification).

[0792] In another aspect of the present invention, there is provided amethod of deleting compiled code in a computer system, including:

[0793] examining each frame of each stack of each thread in the system;

[0794] identifying whether a return address points to a portion ofcompiled code which is to be deleted; and

[0795] rearranging the contents of each stack containing the returnaddress so as to enable that thread to continue execution without thatportion of the compiled code which is to be deleted.

[0796] In a further aspect of the invention, there is provided a methodof deleting a section of compiled code in a computer system, the methodincluding, examining the memory of the computer system identifying alink to the section of compiled code and altering the link.

[0797] The link to the section of compiled code is preferably a returnaddress in a frame. Thus, the return address identified when examining aframe is preferably a return address which is in the range of addressesof the section of compiled code to be deleted.

[0798] Alternatively, or in addition, the examination of the memory mayidentify a patch or other jump to the compiled code to be deleted. Thelink may be a direct or an indirect link to the compiled code to bedeleted. For example, the link may be via a section of glue code to thesection of compiled code.

[0799] Preferably, the computer system is configured as a virtualmachine.

[0800] In a further aspect of the present invention, there is providedan apparatus for examining memory in a computer system to allow asection of compiled code to be deleted, including:

[0801] means for examining a frame of a stack in the computer system;

[0802] means for identifying whether the frame contains a return addresswhich is in the range of addresses of the section of code to be deleted;and

[0803] means for altering the contents of the frame.

[0804] In another aspect of the present invention there is providedapparatus for deleting compiled code in a computer system, including:

[0805] means for selecting a section of compiled code to be deleted;

[0806] means for examining a frame of a stack in the computer system;

[0807] means for identifying whether the frame contains a return addresswhich is in the range of addresses of the section of compiled code to bedeleted;

[0808] means for altering the contents of the frame; and

[0809] means for deleting the section of compiled code.

[0810] The apparatus may further include means for executing subsequentinstructions, and the means for arranging the contents of the frame maybe adapted to change any such return address to the address of the meansfor executing subsequent instructions. Preferably, the apparatus furtherincludes a fallback interpreter. The means for executing subsequentinstructions may be arranged to interpret subsequent instructions untila section of emulated instructions is encountered for which there is acompiled version.

[0811] The means for arranging the contents of the frame may be adaptedto alter values in the frame to enable execution to continue without thecode to be deleted, if the frame contains such a return address.

[0812] In a preferred embodiment of the invention, a record is kept ofthe optimisations which have been carried out in compiling code so that“clean up” information will be available as to what alterations arerequired to update the values to allow for the subsequent execution, forexample, by the interpreter. For a computer system including theapparatus, preferably the system further includes a compiler system, thecompiler system including a recorder for recording “clean up”information as the code is compiled.

[0813] The means for examining a frame in the stack may be adapted toexamine a plurality of frames in the stack. The means for examining aframe in the stack may be adapted to examine the stack of each of aplurality of threads in the computer system.

[0814] The invention also provides a virtual machine including theapparatus described above.

[0815] The invention further provides a computer system including theapparatus described above.

[0816] In another aspect, the invention provides a computer systemincluding means for deleting compiled code, further including means forexamining each frame of each stack of each thread in the system, meansfor identifying whether a return address points to a portion of compiledcode which is to be deleted, and means for rearranging the contents ofeach stack containing the return address so as to enable that thread tocontinue execution without that portion of compiled code about to bedeleted.

[0817] The invention further provides a computer-readable storage mediumhaving a programme recorded thereon for carrying out a method asdescribed above.

[0818] The features of any of the above aspects may be provided with anyother aspect, in any appropriate combination. Apparatus features may beapplied to the method aspects and vice versa.

[0819] Preferred features of the present invention will now bedescribed, purely by way of example, with reference to the accompanyingdrawings, in which:

[0820]FIG. 6A illustrates the principle of a virtual machine;

[0821]FIG. 6B illustrates the operation of an emulator stack;

[0822]FIG. 6C illustrates the operation of a unified stack;

[0823]FIG. 6D shows an embodiment of the present invention; and

[0824]FIG. 6E shows an apparatus embodiment of the present invention.

[0825] Prior to a description of a preferred embodiment, background tothe preferred embodiment will first be discussed.

[0826] A virtual machine allows software which has been written for oneoperating system to run on another operating system; the software isthen termed ‘non-native’ software. In order to allow the non-nativesoftware to run, the virtual machine emulates the operation of theoperating system for which the software was written. This situation isillustrated in FIG. 6A. The virtual machine 5004 translates theinstructions of the non-native software 5002 into native instructionswhich can be run by the host operating system 5006. Conventionalemulators work by interpreting the non-native instructions duringexecution.

[0827] Any execution path, or ‘thread,’ will have a stack associatedwith it. A stack is an area in memory that stores frames consisting oftemporary register information and return addresses of subroutines. Inthe conventional emulator, the non-native application has its own stack(the emulator stack) separate from the stack of the host operatingsystem.

[0828] An example of the operation of the emulator stack is shown inFIG. 6B. Referring to that Figure, a section of non-native code 5008 hasa call instruction at address aaa which calls a subroutine 5010 locatedat address bbb. When the emulator encounters the call instruction, theaddress aaa (the return address) is put onto the emulator stack 5009,together with temporary register information, and the path of executionthen jumps to address bbb. At the end of the subroutine the emulatorencounters a return instruction. It then takes the return address fromthe stack, together with the register information, and returns to theinstruction following the call instruction in the main routine.

[0829] In the virtual machine of the preferred embodiment, rather thaninterpreting the non-native instructions, part or all of theinstructions are compiled into native instructions that can run on thehost operating system. Although a certain amount of time is required forthe compilation, significant time savings can made when running thecompiled code.

[0830] Time savings can be made in various ways. Firstly, if a sectionof code is to be executed more than once, then it will be more efficientto execute a compiled version. Secondly, as described above in Agent'sreference no. 1 of this specification, various assumptions may be madeduring compilation that allow optimisation of the compiled code.Thirdly, time savings can be made by using the host operating system'sstack, and by using native call instructions (rather than emulated callinstructions) to call subroutines.

[0831] Referring to FIG. 6C, non-native main routine 5008 and non-nativesubroutine 5010 are compiled into native main routine 5012 and nativesubroutine 5014. Call instruction 5016 at address xxx is a native callinstruction. When this call instruction is encountered, the address xxx(the return address) is put onto the host stack 5015, together withtemporary register values, and the instructions in the subroutine ataddress yyy are picked up. When the return instruction at the end of thesubroutine is encountered, the return address and register values arepulled from the host stack, and execution of the main routine resumes.

[0832] When using compiled code in the way described above, in somecircumstances it may be desirable or necessary to delete certainsections of compiled code. This may be because the memory area in whichthe compiled code is stored is required elsewhere, or becauseassumptions that where made during compilation are no longer valid.Also, it is desirable to remove any code which is not expected to berequired in the future, particularly when working in a limited memoryenvironment.

[0833] A problem arises if a section of compiled code is discarded whilethe processor is executing a subroutine that has been called from thatsection of code. In this situation, a return address is left on thestack which points to a section of code that no longer exists.

[0834] According to the preferred embodiment, prior to deletion of asection of compiled code, the stack is examined frame by frame toidentify any return addresses that point to the section of code to bedeleted. If such a return address is identified, the address is changedto the address of a piece of continuation code referred to herein as‘glue code’. The glue code enables execution to continue without thepiece of code to be deleted. This is done either by interpretinginstructions in the original, non-native code until a section of code isencountered for which there is a compiled version, or by jumping toanother compiled version of the code, if this exists.

[0835] A discussion of the use of glue code and the transfer ofexecution between compiled and non-compiled code, and between compiledand compiled code, can be found in Agent's reference no. 1 of thisspecification.

[0836] As noted above, when a subroutine is called, temporary registerinformation is also put onto the stack, in the same frame as the returnaddress. Since various optimisations may have been made during thecompilation of the code, this register information may only be valid ifthe rest of the compiled code is executed. For example, when a sectionof code is compiled, the compiler may have identified that not allparameters are needed in that section of code. In that case, some of theregister information may have been left out, since it is not needed forexecuting the rest of the compiled code. However, if execution thenreturns to the original interpreted code, all of the parameters areneeded (since the interpreter cannot look forward to see whichparameters are or are not needed). Thus, it may be that missing registerinformation needs to be added, before the interpreted version of thecode can be executed.

[0837] The problem of incorrect register information could be avoided bymaking sure that, when a subroutine is called, all of the registerinformation which is put on the stack is valid even if the rest of thecompiled code were not executed. Alternatively, when optimisations aremade which affect the register information, this fact could be recorded,together with the necessary information to allow the optimisations to beundone, should the rest of the compiled code not be executed. When aframe with a return address is identified, the glue code can thenexamine the optimisations which have been made, and change the registerinformation in that frame, where necessary.

[0838] The preferred embodiment is designed to operate in amulti-threaded environment, that is, an environment in which there aretwo or more processors, or threads, running asynchronously but sharingthe same work space. Each thread has its own stack. In the preferredembodiment, the stack of every thread to which the compiled code mayhave had access is examined, or simply the stack of every thread isexamined.

[0839] In order to examine a stack, the thread to which that stackrelates is stopped for a certain period of time. In one example, thethread is stopped while all of the frames in the stack are examined. Inanother example, the thread is paused for long enough to examine themost recent frame, or a predetermined number of most recent frames, onthe stack. Once these frames have been examined, a ‘return barrier’ isinserted into the stack, in the way described in Agent's reference no. 5of this specification. The thread can then be allowed to continueexecution for as long as the stack stays above the return barrier.

[0840] Referring to FIG. 6D, operation of a preferred embodiment willnow be described.

[0841] In step 5020 it is decided that a certain code buffer is to bedeleted. A code buffer is an area in memory that stores compiled code.In step 5022 a thread is selected whose stack is to be examined. In step5024 that thread is stopped. Optionally, in step 5026, a return barrieris inserted into the stack, and operation of the thread allowed tocontinue for as long as the stack stays above the return barrier.

[0842] In step 5028 a frame on the stack is selected. The first frame tobe examined will typically be the youngest frame on the stack. In step5030 the selected frame is examined to see whether it contains a returnaddress in the buffer that is to be deleted.

[0843] If it is found that there is such a return address, then in step5032 the fragment within the buffer that the return thread points to isidentified. In step 5034 the other frame fields are adjusted to ‘cleanup’ any optimisations of local variable values, or of variables specificto the virtual machine, that may have been made. In step 5036 the returnaddress is changed to point to a piece of glue code.

[0844] If the frame contains no return address into the buffer to bedeleted, or once the values in the frame have been adjusted, then instep 5038 it is determined whether all frames in the stack have beenexamined. If not, then in step 5040 the next youngest frame in the stackis selected, and that frame is examined. Once all of the frames in thestack have been examined, then in step 5042 the thread is restarted, orthe return barrier is removed. In step 5044 it is determined whether thestacks of all threads have been examined. If not, then another threadwhose stack has not been examined is selected and the process isrepeated.

[0845] Once all of the frames in all of the stacks in all of the threadshave been examined and the appropriate changes to the stack contentshave been made, then in step 5046 the code buffer is deleted.

[0846] Referring now to FIG. 6E, apparatus for putting the presentembodiment into effect will be described.

[0847]FIG. 6E shows a computer system including a virtual machine 5050which allows non-native code 5052 to run on host computer 5054. Thevirtual machine includes control means 5056, interpreter 5058 whichinterprets non-native application code, compiler 5060 which compilessections of non-native application code, and ‘stack walker’ 5062. Thehost computer includes a processor 5064 and memory 5068. In FIG. 6E asingle processor is shown which executes several threads simultaneouslyby appropriate division of its time between the various threads, but twoor more processors could be provided, each executing one or morethreads.

[0848] Compiled code 5070 which has been compiled by compiler 5060 isstored in memory 5068. Also located in memory 5068 are a number ofstacks 5072, 5073, 5074 corresponding to the number of threads that arebeing executed by the processor 5064.

[0849] In operation, the control means 5056 may decide at a certain timethat a section of compiled code 5070 should be deleted, for example toallow this area of memory to be used for other purposes. The controlmeans then indicates to stack walker 5062 that this section of code isto be deleted. The stack walker pauses operation of each thread in turn,and examines the frames in the stacks of the threads to identify anyframes which contain return addresses which are in the area of memorycontaining the section of code to be deleted. Any such addresses arechanged to the address of a piece of glue code 5076, and other fields inthe frame are adjusted to ‘clean up’ any optimisations of local variablevalues, or of variables specific to the virtual machine, that may havebeen made. The glue code operates in the way described above withreference to FIG. 6D. Once all of the frames in all of the stacks in allof the threads have been examined and the appropriate changes to thestack contents have been made, the stack walker 5062 indicates to thecontrol means 5056 that the section of code may be deleted. The controlmeans 5056 then controls deletion means 5078 to delete the section ofcompiled code 5070.

[0850] In summary, at code deletion time, each thread in the virtualmachine is paused in turn, and the stacks of these threads are scanned,looking for return address values which point at code which is to bedeleted. Once one of these cases is found, the state of the stack aroundthe return address value is adjusted to “clean up” the virtual machinestate for that thread at the point where the return is encountered(i.e., some time in the future for that thread), and the return addressvalue itself is adjusted to cause the flow of execution to transition toone of a small number of central pieces of code. These centralisedpieces of code (termed “glue code”) perform some generalised checks andcause the continuation of the flow of execution for that thread in theappropriate manner; usually this will involve interpretation ofsubsequent emulated instructions until a section of emulatedinstructions is encountered for which there is a compiled version.

[0851] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike are performed on or using electrical and like signals.

[0852] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features that relateto the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[0853] In any or all of the aforementioned, the invention may beembodied in any, some, or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[0854] As used herein throughout the term ‘computer system’ may beinterchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’‘machine,’ and like terms. The computer system may be or may include avirtual machine.

[0855] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[0856] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[0857] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[0858] Agent's Reference No. 7—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of OperatingThat System

[0859] The present invention relates to a method of operating a garbagecollector (especially a concurrent garbage collector) in a computersystem, to a computer and computer system for garbage collection, to acomputer-readable storage medium and to a Virtual Machine. In apreferred embodiment, the present invention relates to grey packets:low-contention grey object sets for concurrent marking garbagecollection in a highly multi-threaded environment.

[0860] At a general level, the invention is applicable to run-timeenvironments; at a more specific level it is applicable to automaticdynamic memory management.

[0861] Reference is made herein to “memory objects”. These are typicallyarbitrary discrete areas of memory organised into fields, some of whichmay be references to other objects or even to the same object (not to beconfused with the objects in object oriented programming).

[0862] For efficient use of memory in a computer system, it is importantthat some mechanism is in place which will allow memory to be releasedfor reallocation so that it may be used again once its current use isexpended.

[0863] Such ‘memory management’ may typically be ‘manual,’ where theprogram itself contains code indicating that it requires memory toperform a function and code indicating when it has finished using thatmemory, or ‘automatic’ where the program does not inform the computersystem when it has finished with memory and instead the system itselfhas to implement some way of identifying and recovering expended memory.The latter is conveniently referred to as ‘garbage collection’ andrelies on the computer system initiating a process in which it searchesthrough the memory objects currently being utilised by a program. Anysuch objects which are encountered during the search are regarded ascurrently in use whilst others not encountered cannot be currently inuse and may be regarded as dead and therefore available forreallocation.

[0864] In previous attempts to effect garbage collection (GC), threespecific techniques have been proposed. In the first, known as‘reference counting,’ the number of references or pointers to variousmemory objects are maintained and the system looks for an occasion whena reference changes to zero, thereby indicating that the objectpreviously pointed to has become ‘free’ for reallocation. A disadvantagewith this technique is that it is inefficient in multi-threadedenvironments and is unable to detect when cyclic structures (forexample, when object A refers to object B, which refers back to A again)have become garbage.

[0865] In the second technique, known as ‘copying,’ memory is dividedinto two sections, identified in FIG. 7A as the ‘FROM space’ 12102 andthe ‘TO space’ 12104. Memory for objects is allocated at linearlyincreasing addresses within FROM space 12101 until it is full. At thatpoint all work is forced to stop for GC which copies all live objects12106 into a more compact area 12108 in the ‘TO space’ 12104. Referencesare also changed at the same time to take account of the new locationsin the ‘TO space’ 12104. The roles of the FROM and TO spaces are thenreversed and new memory allocation continues but now using the TO spacein the same way as the previous FROM space was used. The majordisadvantages with this technique are the additional memory requirementand the down time incurred every time there is a GC routine implementedand a change over of roles between the FROM and TO spaces.

[0866] The third technique, a so-called ‘mark/sweep’ technique, involvesall memory being located in one logical unit containing objects. GC isinvoked when there is no region of memory in the heap large enough tosatisfy an allocation request, at which point it will colour all objects“white” and trace all possible paths through references to live objects.Any objects reached by the GC are coloured “black” and regarded as live,while areas not reached remain “white” and can be regarded as dead andavailable for reallocation. The final stage of the technique involves a‘sweep’ operation in which all areas marked white are released and workis allowed to continue.

[0867] In more detail, with the mark/sweep technique, as can be seenfrom FIG. 7B, in the marking (tracing) phase, when an object isencountered but not all of the objects it refers to have been visited,it is marked as grey and references to it are put into a data structure12202 in the form of a memory stack termed the grey stack. (In thisconnection, a typical memory stack may be regarded as a memory store ofvariable size in which items are successively added from top to bottomso that the ‘youngest’ items are nearest the bottom of the stack. Thisconvention is for illustrative purposes only. It makes no difference tothe operation of the stack whether new items are systematically added tothe top or to the bottom.) FIG. 7B also shows at 12204 a memory heap,which is the storage area for memory objects, including theircoloration.

[0868] The references themselves are also investigated by looking at thefirst reference in the data structure. That reference is removed fromthe grey stack and the object it refers to is coloured “black.” Then anyreferences 12206 in the object to other objects which have not yet beenencountered by the tracing process are pushed onto the grey stack, andthose objects are recolored “grey.” Each object, shown enlarged at 12208for convenience in FIG. 7B, includes an indication 12210 of theblack/white status of the reference and pointers such as 12212 to otherobjects in a stack. The process is repeated until the grey stack isempty. Subsequent to the tracing process there is the sweep phase inwhich what is black is made white and what is white is made availablefor future use. At the end of the garbage collection, it will beunderstood that the grey stack ceases to exist.

[0869] The major disadvantage with the mark/sweep (tracing) technique isthe lost down time while work stops, and its greater complexity thaneither of the two previous techniques. Its major advantage againstcopying GC is that there is little or no spatial redundancy.

[0870] In so-called ‘concurrent’ environments, objects could be beingmanipulated whilst they are being traced. With reference to FIG. 7C,specifically a reference field “b” (12306) in an object could be updatedto refer to a different reference “d” (12308). If the object A beingupdated (designated 12302) is “black” (that is it has been fully tracedby the GC) while the new object B (designated 12304) is “white”, thenthere is a risk that B could be mis-identified as dead if A becomes theonly route to B. This occurs because the GC has no reason to revisit A,so B will never be traced. Systems using concurrent GC use a “writebarrier” to trap such situations, colouring B objects “grey” and pushingreferences to them onto the grey stack. Since there is only normally onegrey stack for each Virtual Machine, there are likely to be contentionsfor usage of memory and of the grey stack when under use by GC.

[0871] Indeed, the set of grey objects is a resource shared amongstseveral threads of control, all of which could alter it. Hence anyalteration must be policed by a locking mechanism of some kind. The greyset is used heavily during the tracing process, so there is a highprobability that any attempt to gain access to the grey set will find italready in use. In addition, any overheads incurred by the lockingmechanism will tend to be magnified. In other words, in concurrent GCother parts of the system can be attempting to alter objects while theGC is still tracing through methods to locate the live and dead memorylocations. Special measures may need to be taken in order to prevent alive object being identified incorrectly as dead and thereby beingreallocated. Corruption and/or loss of data could thereby result. Atypical solution to this problem has been to use a ‘write barrier’ onall operations which could alter the contents of objects.

[0872] A further problem for GC is that space for the entire grey stackhas to be allocated at the start of the GC cycle and usually has to belarge enough to cope with the worst eventuality, even though it ishighly unlikely that that will occur. Hence, most of the space set asidefor the grey stack is wasted.

[0873] The invention can be regarded as relating in one aspect to themanagement of the grey queue (or stack) in order to overcome the problemthat there is a lot of contention for access to the grey stack.

[0874] In one aspect the present invention provides a method ofoperating a garbage collector in a computer system, the garbagecollector having (typically at a given time) a set of partially tracedmemory objects (typically ‘grey’ objects), the method including handlingthe set of partially traced memory objects in a plurality of discretepackets (or dividing the set of partially traced memory objects into theplurality of discrete packets).

[0875] By handling the set in a plurality of discrete packets, the setonly occupies the space that it needs to. This can be contrasted withthe known grey stack, which is essentially of large, fixed size.

[0876] The garbage collector may, for example, be an incremental orpausing garbage collector. However, preferably, for speed of operation,the garbage collector is a concurrent garbage collector. (Typically aconcurrent garbage collector operates concurrently with the execution ofat least one other thread of control; that is it does not preventmutation occurring at the same time as the garbage collection. In anon-concurrent garbage collector the collector's thread is the onlythread which is running, and so no locking is required). In this case,preferably each packet is accessible by at most one thread of control atany given time. This can limit the amount of locking required to theoccasions when a thread finishes with one packet and needs another towork on. This, in turn, can improve the performance of a GC in a veryheavily used system and/or reduce the memory requirement of the computersystem, by releasing memory no longer in use.

[0877] Preferably, different packets can be accessed by differentthreads of control at the same time. This can enhance the degree ofconcurrency in the system.

[0878] In order to enhance concurrency, the packets are preferablytreated separately so that they can be used by different threads.

[0879] Preferably, each packet that is currently in use by a particularthread of control is marked as ‘checked out’ and each packet thatcurrently has no particular thread of control using it is marked as‘checked in’, and only checked out packets can be operated on by theparticular thread of control, whereas for each checked in (grey) packetpreferably a mutual exclusion lock is imposed before its contents can beread by a thread. This can afford a convenient way of managing thepackets.

[0880] The minimum number of packets is two, as described later, one isfor filling up with references to grey objects, the other is foremptying during “blackening.” The packets are preferably sufficientlylong to afford the advantages of division into packets and avoid thedisadvantage of using too much memory (especially when multiple threadsare executing), but preferably not so long that they are unmanageableand give rise to an excessive number of locks. Hence, preferably eachpacket contains a number of slots, one per reference to an object, thenumber being one of at least 2, 5, 10, 50 or 100. Equally, preferablyeach packet contains a number of slots, one per reference to an object,the number being one of less than 5,000, 1,000, 500 or 100. These roughsizes have been found to be optimum over a wide range of uses.

[0881] A less important measure of the size of the packets is theirlength in terms of the number of bytes. Preferably, this is a power oftwo. Preferably, each packet is one of at least 8, 16, 32, 64, 128 and256 bytes long. Preferably, each packet is less than one of less then1024, 512, 256, 128 and 64 bytes long.

[0882] Preferably, each packet is of a fixed size. Preferably, eachpacket contains a fixed number of slots and an indication (typically aheader) of the number of slots currently in use within that packet.

[0883] In order to save on memory requirement, the packets arepreferably created and destroyed in accordance with demand. In otherwords, the packets are dynamically managed in that they can be createdor destroyed as required. As described later, the number of packets inexistence is a function of the interval between the marking process andthe blackening process.

[0884] Destruction of the packets may be achieved at least in part bymerging together the contents of partially full packets. This featurecan save on memory requirement.

[0885] In a closely related aspect, the present invention provides acomputer system including a garbage collector, the garbage collectorhaving a set of partially traced memory objects, and means for handlingthe set in a plurality of discrete packets.

[0886] Preferably, the garbage collector is a concurrent garbagecollector.

[0887] Preferably, each packet is accessible by at most one thread ofcontrol at any given time.

[0888] Preferably, the computer system further includes means forrendering (or is adapted to render) different packets accessible bydifferent threads of control at the same time.

[0889] Preferably, the computer system further includes means fortreating (or is adapted to treat) the packets separately so that theycan be used by different threads.

[0890] Preferably, the computer system further includes means formarking (or is adapted to mark) each packet that is currently in use bya particular thread of control as ‘checked out’ and each packet thatcurrently has no particular thread of control using it as ‘checked in,’and means for permitting operation only on checked out packets by theparticular thread of control.

[0891] Preferably, each packet contains a number of slots, one perreference to an object, the number being one of at least 2, 5, 10, 50 or100. Preferably also, each packet contains a number of slots, one perreference to an object, the number being one of less than 5,000, 1,000,500 or 100. Each packet may be of a fixed size. Each packet may containa fixed number of slots and an indication of the number of slotscurrently in use within that packet.

[0892] Preferably, the computer system further includes means forcreating and destroying (or is adapted to create or destroy) the packetsin accordance with demand.

[0893] Preferably, the computer system further includes means fordestroying (or is adapted to destroy) the packets at least in part bymerging together the contents of partially full packets.

[0894] In a closely related aspect, the invention provides a method ofoperating a concurrent garbage collecting system in a computer system ina multi-threaded environment, so as to release memory no longer in use,including:

[0895] tracing the state of each object in a memory group;

[0896] allocating an identifier according to whether the object has notyet been encountered during the tracing process (white), the object andall objects to which it refers have been encountered by the tracingprocess (black), and the object itself has been encountered but some ofthe objects it refers to have not yet been visited (grey);

[0897] dividing the set or sets allocated with the grey identifier intodiscrete packets; and

[0898] assigning a respective packet to each of the threads such thateach thread can work on its respective packet independently of the otherthread(s) and packet(s).

[0899] In a closely related aspect, the invention provides a computersystem including:

[0900] a concurrent garbage collector (preferably a run time engine);

[0901] means for tracing the state of each object in a memory group;

[0902] means for allocating an identifier according to whether theobject has not yet been encountered by the tracing means (white), theobject and all objects to which it refers has been encountered by thetracing means (black), and the object itself has been encountered butsome of the objects it refers to have not yet been visited (grey);

[0903] means for dividing the set or sets allocated with the greyidentifier into discrete packets; and

[0904] means for assigning a respective packet to each of the threadssuch that each thread can work on its respective packet independently ofthe other thread(s) and packet(s).

[0905] The invention extends to a computer system including means foroperating a concurrent garbage collection system and means for dividingthe grey queue into packets such that each packet is accessible by atmost one thread at any given time.

[0906] In a closely related aspect the invention provides a method ofoperating a concurrent garbage collection system in a computer systemenvironment, wherein the grey queue is divided into packets, each packetbeing accessible by at most one thread at any given time.

[0907] Preferably, the computer system is adapted to operate in amulti-threaded environment.

[0908] Preferably, the computer system further includes a manager forthe packets.

[0909] The invention extends to a computer when programmed according tothe above method.

[0910] The invention extends to a computer system including a garbagecollector, the garbage collector having a set of partially traced memoryobjects, when programmed so as to handle the set of partially tracedmemory objects in a plurality of discrete packets.

[0911] The invention also extends to a computer-readable storage mediumhaving a program recorded thereon, the program providing the abovemethod.

[0912] In a closely related aspect the invention provides acomputer-readable storage medium having a program recorded thereon, theprogram providing a method of operating a garbage collector in acomputer system, the garbage collector having a set of partially tracedmemory objects, the method including handling the set of partiallytraced memory objects in a plurality of discrete packets.

[0913] The invention extends to a Virtual Machine including the abovecomputer or computer system.

[0914] In a closely related aspect the invention provides a VirtualMachine when operated by the above method.

[0915] In a closely related aspect the invention provides a VirtualMachine when operated by means of the above computer-readable storagemedium.

[0916] Preferred features of the present invention will now bedescribed, purely by way of example, with reference to the accompanyingdrawings, in which:

[0917]FIG. 7A shows the division of memory according to a prior artapproach;

[0918]FIG. 7B illustrates another prior art approach;

[0919]FIG. 7C shows an arrangement of objects in a so-called“concurrent” environment;

[0920]FIG. 7D shows the tracing of garbage collection work;

[0921]FIG. 7E shows the structure of an object;

[0922]FIG. 7F shows an empty stack;

[0923]FIG. 7G shows the structure of an individual packet according tothe present invention; and

[0924]FIG. 7H shows the overall operation of the present invention.

[0925] First a brief outline of the nature of the invention will bepresented followed by a more comprehensive description of a particularmanner in which the invention can be performed.

[0926] Garbage Collection (GC) is a process whereby a run-timeenvironment can identify memory which was in use at one time, but is nowno longer in use, and make the identified memory available for re-usefor other purposes. Concurrent GC is a way of implementing GC such thatother activity in a program or system does not need to be impeded byongoing GC activity.

[0927] Tracing GCs (concurrent or otherwise) work by followingreferences, indicated as arrows 12400 in FIG. 7D, between memory objectsgenerally indicated as 12402, starting from some given root set 12404,to establish the set of all objects which must be treated as “live.”Objects which are not in that set are deemed to be “dead” and theirmemory space can be recycled. The root set is some starting conditionfor the garbage collection, and is typically a set of public referencesincluding references on the stack of interest.

[0928] The state of the tracing process at any given time can besummarised using the Tricolour Abstraction. Each object has a colourassociated with it:

[0929] White: This object has not been encountered yet during thetracing process.

[0930] Black: The object and all the objects it refers to have beenencountered by the tracing process.

[0931] Grey: The object itself has been encountered, but some of theobjects it refers to may not have been visited (in other words, the greycoloration effectively denotes work in progress). Any tracing GCalgorithm works as follows: initially, colour all objects white recolourgrey all objects immediately referenced from the root while grey objectsexist do let g be any grey object recolour g black for each object oreferenced by g, do if o is white then recolour o grey endif endforendwhile Once this algorithm is complete, the space occupied by anywhite objects can be re- used.

[0932] Marking GCs tend to implement this abstraction fairly literally,while copying GCs do not, with an object's colour implicitly determinedby its absolute location in memory. The present invention is concernedmainly with marking GC algorithms and techniques.

[0933] In marking GC's, the colour of objects is stored within theobject itself, as part of the object's header (12502 in FIG. 7E). Thecolour is encoded as mark information M, 12504, which is in one of fourstates, white, black, grey and free (that is, the object is availablefor allocation).

[0934] M will typically be a pair of bits which together allow the fourdistinct states to be encoded. Recolouring an object is a matter ofaltering the M state information in the object's header in theappropriate way. In the preferred embodiment, object coloration isstored for the lifetime of the object. Outside the operation of the GC,all objects are coloured white.

[0935] Efficiency considerations dictate that the set of grey objectscan be treated as a discrete entity that can be added to (by recolouringgrey) or be removed from (by recolouring black). This set hasconventionally been implemented as a stack. Usually the grey stack tendsto be an explicit stack or an array, with an additional index variableto indicate where reads and writes in the array occur. FIG. 7F shows anempty stack 12602.

[0936] In a concurrent GC algorithm, other parts of the system can bealtering objects while the GC is still tracing. Unless care is taken,live objects can be misidentified as dead. A typical way of eliminatingthis problem is to use a write barrier on all operations that couldalter the contents of objects. Different implementations can work indifferent ways, but they all tend to require that non-GC threads ofcontrol can alter the set of grey objects.

[0937] In general terms, instead of having a single monolithic greyobject set which has to be locked as a whole on each access, the presentinvention divides the set into discrete segments, or packets, (see forexample 12406 in FIG. 7D), preferably such that each thread can beapportioned a segment it (and only it) can work on in isolation. Thiscan minimise the amount of locking required to the occasions when athread finishes with one packet and needs another to work on. Hence thepackets replace the grey stack entirely (which is why the arrow in FIG.7D from the stack to the heap is shown dotted).

[0938] Hence, the present invention involves so-called “grey packets”and in particular the provision of low-contention grey object sets forconcurrent marking garbage collection especially in a highlymulti-threaded environment.

[0939] Some GCs move objects in memory. The system used here preferablydoes not because of the difficulty of doing so in a concurrent GC.Instead, a ‘mark and sweep’ operation is performed. Here, everythingwhite is released at the end of the tracing or ‘mark’ process.Subsequent to the tracing process there is the sweep phase. In the sweepphase what is black is made white and what is white is made availablefor future use.

[0940] A grey packet manager (GPM) is provided by the techniquedescribed herein for managing the grey packets. The GPM comes intoexistence at the start of the program, but typically does not operate(except for housekeeping purposes) unless the garbage collector is alsooperating.

[0941] Any thread, especially but not limited to the GC thread, couldmake something grey. In, for example, a Virtual Machine (VM) the GPM isasked by the thread for its own memory for what is termed a grey packetin hand. One of the reasons for dividing the set of grey objects intoseparate packets is so that the thread has its own grey packet in hand.If the thread wants to continue writing into a grey packet which is fullor very nearly so, the GPM gives that thread a new packet, takes awaythe full one and stores it. The GPM can keep a queue of empty packets inreadiness. Any number of threads can have their own separate packets inhand, so that the grey stack can be divided into a number of regions ofexclusive access, and no global locks are required.

[0942] Grey packets are like mini arrays, which are created anddestroyed on demand. They are handled as complete packets. Grey packetstypically are 256 bytes in size and can hold up to 60 references. Itfollows that only once in every 60 accesses does the grey packet need tocommunicate with the GPM. When there is no current GC there are no greypackets active.

[0943] The most useful features of this technique are that the amount oflocking is minimised, there is dynamic creation and destruction of greypackets in accordance with demand, and there is the ability of thesystem to merge partially full packets so as to minimise memoryrequirements. Also, separation of full and partially full packets allowsa degree of concurrency even within the GPM, so that if a call is madeto the GPM, it is not a locked entity.

[0944] A set of grey packets 12406, as schematically illustrated in FIG.7D, exists as blocks within the program or system. Each block contains afixed number of slots 12408 (each capable of describing a single objectreference), and an indication of how many slots are currently in usewithin that block. In the preferred embodiment, checked-in packets aregrouped in sets, preferably linked to form chains. The structure of anindividual packet 12406 is shown in FIG. 7G. Each grey packet is eitherchecked out, in which case it is currently being used by one (and onlyone) particular thread of control, or checked in, in which case noparticular thread of control is using it.

[0945] The grey packets are managed by a separate module within theprogram or system, the Grey Packet Manager, or GPM. The GPM maintainsthe following resources, internally:

[0946] full: a list of full packets.

[0947] partial: a list of partially full packets.

[0948] Each of the above lists has a separate lock to control access toit. A packet is checked in if it is present in either of the abovelists.

[0949] Externally, the GPM offers the following fundamental services.

[0950] C getEmptyPacket( ): obtain an empty packet (or partially filledpacket, but not a full packet) from the set of checked in packets, alterits status to checked out, and return it to the calling thread.

[0951] C getFullPacket( ): obtain a full packet (or partially filledpacket, but not an empty packet) from the set of checked in packets,alter its status to checked out, and return it to the calling thread.Return NULL if only empty packets are present.

[0952] C submitPacket(p): Verify that grey packet p is currently checkedout, and then alter its status to checked in.

[0953] The GPM performs each of the above operations under lock.

[0954] The GPM can handle the packets in any order it chooses; there isno system of “Last In, First Out”. Externally, the GPM is used with thefollowing API: getEmptyPacket() acquire lock in partial list. let p bepartial list head pointer. if p is NULL, allocate a new packet block p.initialize p's occupied field to 0. else let partial list head pointerbe p's successor. while p is not completely empty and partial list headis not NULL, let m be the minimum of the number of occupied slots in pand the number of unoccupied slots in partial list head pointer. copythe contents of m occupied slots in p into unoccupied slots in partiallist head packet. increment occupied slots count in partial list headpacket by m. decrement occupied slots count in p by m. if partial listhead packet is full, let f be partial list head pointer. let partiallist head pointer be f's successor. submitFullPacket(f). endif endwhileendif release lock on partial list return p. getFullPacket() acquirelock on full list. if full is empty, release lock on full list ;as soonas the lock on the full list is released the full packet can be used -this allows some degree of concurrency even within the GPM

[0955] acquire lock on partial list. let p be partial list head pointer.if p is not NULL, let partial list head pointer be p's successor packet.endif release lock on partial list else let p be full list head pointer.let full list head pointer be p's successor packet. release lock on fulllist endif return p. submitFullPacket(p) acquire lock on full list. letp's successor packet be full list head packet. let full list headpointer be p. release lock on full list submitEmptyPacket(p) deallocategrey packet block pointed to by p.

[0956] Each thread of control (including the GC) has a thread localpacket-in-hand (or tl-pih) grey packet pointer. This pointer may be NULL(indicating that the thread has no packet in hand), but if non-NULL itmust refer to a checked out packet. Marking an object i as grey becomes:if tl-pih is NULL then tl-pih = getEmptyPacket() else if tl-pih is fullthen submitFullPacket(tl-pih) tl-pih = getEmptyPacket() endif recolor igrey set the next unoccupied slot in tl-pih to be i. increment theoccupied slots fields in tl-pih (that is,insert i into tl-pih). A packetis said to be full if its occupied field matches the maximum number ofslots possible in the packet. The main blackening algorithm becomes:obtain a packet p to blacken while p is not NULL do for each reference gin p recolor g black for each object i referenced from g do if i iswhite then mark i as grey endif endfor endfor submitEmptyPacket(p)obtain a packet p to blacken endwhile Obtaining a packet to blacken is:if tl-pih is not NULL then let p be tl-pih ti-pih = NULL else let p =getFullPacket() endif

[0957] The idea is that both the marking and blackening processesoperate only on the thread's packet in hand, which if present at all canbe guaranteed not to be visible to any other thread. Hence, most of thetime no locking is required, except when interaction with the GPM isrequired to submit packets, obtain empty packets or packets to blacken.

[0958] Periodically each non-GC thread submits any packet in hand backto the GPM (only the GC can blacken packets). This is typically donewhen the GC needs to examine a non-GC thread's local data structures.Since these packets may be partially complete, this is how the partiallist in the GPM gains entries. Since it is desirable to have as few greypackets allocated as possible, getEmptyPacket( ) prefers where possibleto make empty packets from the partial list by “fusing” the contents oftwo partial packets into a single, fuller packet, leaving behind anempty (or at least less full packet) which can be returned to thecaller. A completely new empty packet is only created if the partialpacket list is empty.

[0959] As will be seen from the above, the primary aim of this techniqueis to improve the performance of Concurrent GC in highly multi-threadedenvironments, by virtue of minimising locked accesses to a global datastructure. Hence a commercial product utilising Concurrent GC with thistechnique will perform better than one using a more traditionalapproach.

[0960] A summary of some of the main functions of the Grey PacketManager is presented in the table below. In the table, each function isshown underlined; the steps of that function follow the function itself.Each step is placed in one or two of three columns (“Full Packet”,“Partial Packet” or “Empty Packet”), depending on whether the step isperformed using full, partial or empty packets. Full Packet PartialPacket Empty Packet

[0961] Marking phase—proceeds in the following repeated stages untilthere are no more objects to mark

[0962] (a) getEmptyPacket (get a new empty packet and mark it as grey)

[0963] (b) submitFullPacket (submit a full grey packet)

[0964] (c) getEmptyPacket (get a further new empty packet)

[0965] Blackening Phase—this proceeds repetitively until step (b) fails

[0966] (a) getFullPacket (for blackening purposes)

[0967] (b) submit “Empty” Packet (into the GPM)

[0968] Death of a thread

[0969] On death of thread, submit any tl-pih back to the GPM

[0970] General housekeeping

[0971] GC periodically submits tl-pih's of other threads into GPM

[0972] Referring finally to FIG. 7H, the overall function of thepreferred embodiment is now summarised, with particular reference to theflow of packets between the various main components.

[0973] In FIG. 7H, the grey packet manager (GPM) is denoted 12700, thegarbage collector (GC) is denoted 12702, various threads of control(‘mutators’) are denoted 12704, 12706 and 12708, and the packets aredenoted 12406. Thread 12708 represents the ‘nth’ mutator, and shows noflow of packets since it has not had a write barrier to trigger. Thevarious packet flows are denoted by encircled numerals, whose meaning isas follows:

[0974] 1) Get new empty packet

[0975] 2) Submit full packet

[0976] 3) Submit partial packet

[0977] 4) Get full packet to blacken

[0978] 5) Submit empty packet

[0979] A general summary of GC technology, concurrent and otherwise, canbe found in “Garbage Collection: Algorithms for Automatic Dynamic MemoryManagement” by Richard Jones and Rafael Lins, published by John Wiley,ISBN 0-471-94148-4. The disclosure of this document is herebyincorporated by reference.

[0980] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike are performed on or using electrical and like signals.

[0981] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[0982] In any or all of the aforementioned, the invention may beembodied in any, some, or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[0983] As used herein throughout the term ‘computer system’ may beinterchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’‘machine,’ and like terms. The computer system may be or may include avirtual machine.

[0984] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[0985] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[0986] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[0987] Agent's Reference No. 8—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of OperatingThat System

[0988] The present invention relates to a computer system and a methodof operating a computer system. The invention preferably relates to acomputer system and method for handling interrupts. The invention findsparticular (but not exclusive) application in relation to virtualmachines and/or in multi-threaded environments. A preferred embodimentof the invention relates to executing device driver interrupt handlerswritten in Java.

[0989] Interrupt handlers in computer systems are used as a way ofmanaging communications between the CPU and other devices (normallyhardware items) connected to it. The CPU and the device interact witheach other through software known as a device driver and unique to thatparticular device. The word device may include such familiar items as akeyboard, printer, mouse, scanner and the like, in fact any input oroutput device.

[0990] In the implementation of device driver software, it is usuallyrequired that device interrupts be dealt with by code within the driveritself.

[0991] The code written as part of the driver to deal with suchinterrupts usually has significant constraints placed upon it; this isbecause such code can be executed at almost any time at all in relationto the main-line application, often using a small, fixed-size, separatestack provided by the operating system. Thus the operating systemhandles directly the interrupts.

[0992] In the case of a computer system including non-native software,in view of the acknowledged difficulty of dealing with interrupts, aninterrupt handler is provided by the (host) operating system. It isimportant that interrupts are dealt with as soon as possible and so theoperating system is chosen to handle interrupts. There are two mainproblems with such a technique, both stemming from the fact that thenon-native system is significantly divorced from the real interruptlevel. Firstly, there is the problem that special device handling(reading/writing special values from/to device registers) may no longerbe valid since the non-native system did not deal directly with theinterrupt and the real interrupt has already been dismissed by the hostsystem before the non-native system is informed of the interrupt.Secondly, a substantial length of time may have elapsed between the realinterrupt occurring and the non-native code relating to it actuallyexecuting.

[0993] According to a first aspect of the present invention, there isprovided a computer system including a native operating system andnon-native software, wherein the non-native software is arranged tohandle interrupts directly. Thus, rather than the interrupt beinghandled directly by the operating (host) system and the non-nativesoftware being informed later about the interrupt occurring, theinterrupt is handled synchronously by the non-native software.Preferably, the non-native software includes an interpreted language. Ina particularly preferred embodiment of the invention, the non-nativesoftware includes the Java language. The invention is also applicable toJava-like languages. Preferably, the computer system is structured as oris implementing a virtual machine.

[0994] According to the first aspect of the present invention, there isalso provided a method of operating a computer system, the computersystem including a native operating system and non-native software,wherein interrupts are handled directly by non-native software.

[0995] In a preferred embodiment, this invention relates to the fullimplementation of a device driver including its interrupt handlers innon-native software, in particular in Java (although the many parts ofthe invention would certainly apply to other interpreted languages aswell).

[0996] In the use of a non-native language, in particular Java, at theinterrupt level, various problems are encountered. For example, the factthat Java (and other languages) are languages that use garbagecollection adds to the complexity of this problem in that interrupthandler code may need to run successfully at any arbitrary point in thegarbage collection process without interfering with it or failing itselfin some way due to it. This and other problems are discussed in moredetail below.

[0997] The only prior proposal the applicant is aware of for handlingthis problem involved the (Java) Virtual Machine (VM) having its owndedicated interrupt handlers implemented in a non-interpreted language(Assembler or C) which handled the interrupt in a generic way and thendismissed it before passing a note of its occurrence to a high priorityJava thread running at non-interrupt-level.

[0998] There are two main problems with such a technique, both stemmingfrom the fact that the Java code written to handle device interrupts issignificantly divorced from the real interrupt level. Firstly, theproblem that special device handling (reading/writing special valuesfrom/to device registers) may no longer necessarily be valid, accordingto the type of device in question, the real interrupt has already beendismissed. Secondly, a substantial length of time may have elapsedbetween the real interrupt occurring and the Java code to handle itactually executing.

[0999] Java must adhere to certain rules and it was thought that the useof Java for interrupt handling would not be practical because it mightbe difficult to ensure that there could be adherence to the rulesinherent to Java at the interrupt level.

[1000] For example, when a piece of code requires a semaphore, such as amutex, e.g., to prevent simultaneous attempts to work on a code tomaintain a queue, an operation to insert, and an operation to take out,no operation is carried out until the semaphore is acquired. Semaphorescannot be acquired at the interrupt level. If a semaphore had alreadybeen acquired at a non-interrupt level, no reliable action is possibleat the interrupt level.

[1001] With Java (and other interpreted languages) there may also beproblems with code management.

[1002] The problem with the prior art is that it cannot handleinterrupts in real time. For example, in the proposal indicated above,the system just makes a request to process the interrupt in a normalthread as soon as possible.

[1003] The fact that the non-native (for example, Java) code never runsat the real interrupt level in this solution does, however,substantially alleviate problems with garbage collection, problems withwhich are indicated above.

[1004] The solution to the problem of actually getting non-native codeto run at real interrupt level was broken down into sub-problems, solvedas follows; in the following any, some, or all of the sub-problems andany of the solutions may be combined in any appropriate way:

[1005] According to a second aspect of the invention, there is provideda computer system including an interrupt handling method, wherein theexecution of the interrupt handling method is arranged to be initiatedin advance of the first interrupt, the execution proceeding to a waitingstate and being caused to resume when an interrupt occurs. Preferably,the interrupt handling method is arranged to be initiated oninitialisation of the computer system.

[1006] In one embodiment of the invention, a special interrupt handlermethod is provided by the non-native software. When an interrupt occurs,the method is called. On calling the method, various steps, for examplethe setting up of necessary stacks and frames, need to be performed.When the interrupt has been dealt with, the method is closed. Differentmethods could be written for different interrupts, i.e., different IRQs,each method therefore handling the particular interrupt in anappropriate manner.

[1007] In accordance with the second aspect of the invention, theinterrupt method is preferably opened as a part of the set up of thesystem, the method is ready and waiting for when an interrupt occurs.Thus, execution of the interrupt handler, and thus, the handling of theinterrupt, can be faster. That is of particular importance where thenon-native language is an interpreted language. Interpretation isrelatively slow compared with the execution of native instructions.Since it is important to deal with interrupts as quickly as possible, itwould have been considered desirable for the interrupts to be handled bythe operating system. In accordance with the second aspect of theinvention, however, it is possible for the handling of the interrupt bynon-native code to be faster by providing a stack ready for use when theinterrupt occurs. Thus, at least some of the loss of execution speedinherent in the use of an interpreted language to handle interrupts canbe avoided.

[1008] Preferably, the interrupt handling method is arranged to generatea stack on initiation, the stack persisting in the waiting state.Preferably, the interrupt handling method is arranged to include anexecution thread on initiation, the thread being made permanentlyinactive in such a way that the stack persists in the waiting state.Thus, the steps have already been taken to open the stack and theinterrupt can be dealt with as soon as it occurs.

[1009] In some cases, for example, where Java is used as the non-nativelanguage, it is possible to destroy the thread completely. In manycases, this is preferred since the memory used by the thread can bereleased. In other cases, the destruction of the thread will not bepossible. In such cases, the thread will lie dormant until an interruptoccurs.

[1010] Preferably, the method is arranged so that the thread isapparently reactivated by having interrupt flow of control using thestack when an interrupt occurs. The reactivation is preferably theresult of having interrupt flow of control using the stack.

[1011] The interrupt flow of control switches to the stack of theinterrupt handling method so that it appears that the interrupt handlingmethod thread has been reactivated.

[1012] Preferably, the interrupt handling method includes a plurality ofdifferent waiting states. Thus, it is possible in accordance with thesecond aspect of the invention for various different types of interruptsfor a given device to be dealt with using a single interrupt handlermethod.

[1013] The second aspect of the invention also provides a computersystem having a non-native interrupt thread stack waiting to be switchedto when an interrupt occurs. Preferably, the interrupt thread stack is aJava stack.

[1014] Preferably, the non-native interrupt thread stack is partiallyfilled.

[1015] Preferably, the computer system is structured as or implements avirtual machine.

[1016] Preferably, the computer system of the second aspect of theinvention also includes features of the first aspect of the invention.

[1017] The second aspect of the present invention also provides a methodof operating a computer system, the method including initiating aninterrupt handling method in advance of the first interrupt, theexecution of the method proceeding to a waiting state, the methodresuming when an interrupt occurs.

[1018] The second aspect of the invention also provides a method ofhandling an interrupt in a computer system, wherein the interrupthandling method is terminated mid-method at the waiting state, leaving astack. Preferably, the thread of the interrupt handler method isapparently reactivated, preferably by having interrupt flow of controlusing the stack when an interrupt occurs. In a preferred embodiment ofthe invention, the interrupt handler method is a non-native interruptmethod, preferably a Java interrupt method.

[1019] In one of its most general aspects, the invention includes acomputer system or a method of operating a computer system in which anon-native (preferably a Java) thread stack is kept ready and waiting tobe switched to when an interrupt is detected.

[1020] In the case of an interpreted language being used for theinterrupt handler, there would be a large overhead in entering theinterrupt handler method if the method were called when an interruptoccurred.

[1021] In its preferred form, the invention lies in the context of asoftware VM and the significant feature of the invention is that thesystem or method runs non-native (preferably Java) bytecode at interruptlevel.

[1022] In summary, a problem was seen to be that real interrupt levelruns on a small, separate OS-supplied stack which is unsuitable for useby the non-native bytecode execution engine. Embodiments of the presentinvention have a normal non-native thread stack ready and waiting to beswitched to when an interrupt occurs.

[1023] In the second aspect, the invention provides a method ofimplementing device driver interrupts in a computer system structured asa virtual machine, the method including having a special interrupt stackready to run the instant an interrupt call is received.

[1024] In a preferred form of the invention as set out in the precedingparagraph, the system is ready to run an interpreted language (e.g.,Java code). In a modification, the special interrupt thread stack is anormal (Java) thread stack which is switched to when an interruptoccurs.

[1025] The invention also extends to a computer system provided withmeans for implementing device driver interrupts, including a specialinterrupt stack ready to run the instant an interrupt call is received.

[1026] In a preferred form of the invention as set out in the precedingparagraph, the system is ready to run an interpreted language (e.g.,Java code). In a modification, the special interrupt thread stack is anormal (Java) thread stack that is switched to when an interrupt occurs.

[1027] Preferably, the system is such that potentially blockingsynchronisation operations are not made while the interrupt handlermethod is executed.

[1028] It is important that no potentially blocking synchronisationoperations are carried out during interrupt handling.

[1029] In accordance with a third aspect of the present invention, thereis provided a computer system including an interrupt handler including anon-native interrupt handler, the system including means for carryingout first-time execution activity in advance of the first interrupt.

[1030] In many cases, first time execution activities include semaphoresthat are unavoidable (for example, those used in class loading). Bycarrying out such activities before the interrupts occur (for example,on initialisation of the system), the use of such semaphores can beavoided.

[1031] Preferably, the code of the interrupt handler is pre-resolved.Thus, steps which unavoidably involve mutexes, for example classresolution, can be carried out before interrupt level handling occurs.Preferably, the code of the interrupt handler is pre-compiled.

[1032] Preferably, the computer system is structured as or implements avirtual machine.

[1033] Preferably, the computer system of the third aspect of theinvention also includes features of the first and/or second aspects.

[1034] Preferably the method of handling interrupts includes not makingany potentially blocking synchronisation operations while executing theinterrupt handling method.

[1035] In accordance with the third aspect of the invention, there isprovided a method of handling interrupts in a computer system using anon-native interrupt handler method, the method including carrying outfirst-time execution activity in advance of the first interrupt.

[1036] Preferably, the method includes the step of pre-resolving thecode of the interrupt handler method, and preferably includes the stepof pre-compiling the code of the interrupt handler method.

[1037] In summary, the bytecode execution engine must not attempt anypotentially blocking synchronisation operations while executing thebytecode of an interrupt handler.

[1038] In accordance with the third aspect of the invention, it can beensured that the normal routes through the bytecode execution enginehave no potentially blocking synchronisation operations—this isdesirable from a performance point of view anyway. Additionally, it ispreferable to make sure that the nature of the bytecode of an interrupthandler never requires other than the normal routes through the bytecodeexecution engine.

[1039] The second aspect of the invention also provides a method ofimplementing device driver interrupts in a computer system that isstructured as or is implementing a virtual machine, the method includingpreventing the bytecode execution engine from attempting any potentiallyblocking synchronisation operations while executing the bytecode of theinterrupt handlers.

[1040] The second aspect of the invention further extends to a computersystem provided with means for implementing device driver interrupts,including means for preventing the bytecode execution engine fromattempting any potentially blocking synchronisation operations whileexecuting the bytecode of the interrupt handlers.

[1041] According to a fourth aspect of the invention, there is provideda computer system including an interrupt handler and a garbagecollector, the system being such that interaction between the interrupthandler and the garbage collector is prevented.

[1042] The fourth aspect of the invention applies particularly tonon-native software having a garbage collection system.

[1043] If the interrupt level were to, for example, put an object on aheap to which a garbage collector (GC) had access, the GC might alterthe object, for example by trying to perform garbage collection or evenby just looking at it.

[1044] Preferably, the interrupt handler includes objects, the objectsof the interrupt handler being isolated from the GC. Preferably, theinterrupt handler includes a heap, the heap being isolated from the GC.Thus the GC is not able to alter or collect any objects belonging to theinterrupt handler.

[1045] Preferably, the system further includes means for preventingalteration of references fields in interrupt handler objects other thanby the interrupt handler. Thus, the interrupt handler can also beprotected from interference by non-interrupt level threads. Preferably,the interrupt level is not able to directly alter or contact anynon-interrupt level objects. Thus, preferably, the interrupt level iscompletely isolated from the non-interrupt level.

[1046] Preferably, the computer system is structured as or implements avirtual machine.

[1047] Preferably, the computer system of the fourth aspect of theinvention also includes features of the computer system of the first,second and/or third aspects of the invention.

[1048] The fourth aspect of the invention also provides a method ofoperating a computer system including an interrupt handler and a GC,wherein interaction between the interrupt handler and the GC isprevented.

[1049] Preferably, the interrupt handler device includes objects,wherein alteration of reference fields in interrupt handler objectsother than by the interrupt handler is prevented.

[1050] In summary, the bytecode execution engine must not do anythingthat could interfere with or fail because of any phase of garbagecollection occurring (potentially simultaneously) at non-interruptlevel. This can be achieved in a preferred embodiment by denyinginterrupt level code the full flexibility of the garbage collected Javaheap.

[1051] The invention further provides a method of implementing devicedriver interrupts in a computer system structured as or implementing avirtual machine, the method including preventing the bytecode executionengine from interfering with simultaneous garbage collection atnon-interrupt level.

[1052] The invention further extends to a computer system provided withmeans for implementing device driver interrupts, including means forpreventing the bytecode execution engine from interfering withsimultaneous garbage collection at non-interrupt level.

[1053] A fifth aspect of the invention provides a computer systemstructured as or implementing a virtual machine, the system including anon-native interrupt handler at the interrupt level, the systemincluding means for enabling information from the interrupt level topass to other levels. While communication with the interrupt level andthe non-interrupt level is necessary, to avoid any potentialinterference in the handling of the interrupts, it is necessary for theinterrupt level to use a special technique to communicate someinformation to the non-interrupt level.

[1054] Preferably, the system includes means for using native calls topass information from the interrupt level to other levels. Thusinformation can be passed to non-interrupt level while minimising therisk of disturbance during interrupt handling.

[1055] Preferably, the computer system of the fifth aspect includesfeatures of the computer system of the first, second, third and/orfourth aspects.

[1056] The fifth aspect of the invention also provides a method ofoperating a computer system structured as or implementing a virtualmachine including a non-native interrupt handler at interrupt level, themethod including passing information from the interrupt level to otherlevels.

[1057] In summary, the inventions of the third and fourth aspects wouldseem to indicate that communication between interrupt-level Java andnon-interrupt-level Java is hard, if not impossible. In preferredembodiments of the invention, a special mechanism is made available tothe Java application programmer to enable the passing of informationfrom the Java code that runs at the interrupt level to the rest of theapplication.

[1058] The manner in which these sub-problems were approached andovercome will be explained in later sections of the particulardescription.

[1059] The invention yet further provides a method of implementingdevice driver interrupts in a computer system structured as orimplementing a virtual machine, the method including enablinginformation from the (Java) code running at the interrupt level to passto the rest of the application.

[1060] The invention has the advantage of enabling interrupt handlercode to run successfully at any point in the garbage collection processwithout interference.

[1061] The invention yet further extends to a computer system providedwith means for implementing device driver interrupts, including meansfor enabling information from the (Java) code running at interrupt levelto pass to the rest of the application.

[1062] The invention has the advantage of enabling interrupt handlercode to run successfully at any point in the garbage collection processwithout interference.

[1063] The invention also provides a computer programmed to carry out amethod according to any of the aforementioned aspects of the invention.

[1064] The invention also provides a computer-readable storage mediumhaving a programme recorded thereon for carrying out the method of thefirst, second, third, fourth and/or fifth aspect of the invention.

[1065] Embodiments of the invention will now be described purely by wayof example. Reference will be made, where appropriate, to theaccompanying figures of the drawings (which represent schematically theabove improvements) in which:

[1066]FIG. 8A shows parts of a PC computer system for dealing with aninterrupt;

[1067]FIG. 8B shows steps in the handling of an interrupt in anembodiment;

[1068]FIG. 8C illustrates code of an interrupt handler; and

[1069]FIG. 8D illustrates apparatus for carrying out an embodiment.

[1070] By way of background, in a PC-configured computer system, such asschematically illustrated in FIG. 8A, the CPU 18102 and its associatedRAM 18104 are electrically connected to a first one 18106 of two(usually) circuit blocks 18106, 18108 known as Program InterruptControllers or PIC circuits. Each PIC has a total of 8 terminals or pinsto which electrical connection may be made. Conventionally, pin No. 2 ofthe first PIC 18106 is connected to the input of the second PIC 18108.The seven remaining pins of PIC 18106 plus the eight pins of PIC 18108,i.e., 15 in all, are available for electrical connection to furtherdevices, such as those mentioned above.

[1071] The number of the pin to which a device is connected becomes itsidentity, or rather its IRQ number. So, a keyboard connected to pinnumber 5 would have an IRQ=5 label. The CPU communicates with a list18110 of 15 interrupt level code addresses (this is in RAM) so that whenthe CPU receives a signal on pin 5, for example, it can activate thecorresponding code address in the list and generate a correspondingoutput.

[1072] The PIC signals to the CPU by raising the voltage on the lineconnecting it to the CPU. This signal is the device interrupt. In somecases, such a signal is sent to the CPU after every character has beensent to the corresponding device. Once the CPU has finished the currentjob, i.e., the machine instruction it is working on at the time, ratherthan for example the printing of a whole document or page of text, itresponds to the interrupt signal and activates the corresponding deviceaddress to take the appropriate action, e.g., for the next character tobe sent to the device.

[1073] It can readily be appreciated, therefore, that the numbers ofinterrupts demanding attention from the CPU can be enormous and asatisfactory way of managing them is essential for efficient operationof the PC containing that CPU. In some instances it is necessary for asection of a job with a high priority to be protected from interferenceby the interrupt. In such cases, the section of code being processedwill be preceded by a ‘clear interrupt’ or CLI instruction that preventsthe interrupt from being acknowledged until a ‘set interrupt’ or STIcode at the end of the section of code is reached. In order to enhancethis protection, the CPU may switch to a physically separate interruptstack so as to reduce yet further the risk that an interrupt mayinterfere with the process already taking place in the CPU.

[1074] Bearing in mind that communication between the CPU and a deviceconnected through a COM port in the PC, for example, generally takesplace at a relatively modest speed compared to the processor speed;communication is slow. When the CPU makes a mainline call to the device,it writes it to the device, takes the first character of the call andwrites it to the COM port hardware. The CPU then returns to the job itwas doing, e.g., repaginating a document in a word processing package.

[1075] When the interrupt is received and the CPU has established whichdevice raised the interrupt and what the CPU should do in response, theCPU stores enough data to allow it to leave the current process andhandle the interrupt. The stored list includes start addresses whichenable the CPU to say ‘when interrupt code arrives, go to X’, where ‘X’represents the appropriate response to the interrupt. Clearly, theinterrupt handler was unaware of the state of the CPU when the interruptarrived, hence the need mentioned above to separate the handler from theprocess data.

[1076] An Interrupt Return (IRET) Code is located at the end of theinterrupt to tell the CPU that the interrupt is completed and for theCPU to effect a return to the process it was operating before theinterrupt.

[1077] A virtual machine allows software which has been written for oneoperating system to run on another operating system; the software isthen termed ‘non-native’ software. In order to allow the non-nativesoftware to run, the virtual machine emulates the operation of theoperating system for which the software was written. The virtual machinetranslates the instructions of the non-native software into nativeinstructions which can be run by the host operating system. Conventionalemulators work by interpreting the non-native instructions duringexecution.

[1078] Any execution path, or ‘thread’, will have a stack associatedwith it. A stack is an area in memory that stores frames consisting oftemporary register information and return addresses of subroutines.

[1079] So far, no specific mention has been made of the language inwhich the device drivers are written. In preferred embodiments these,drivers are in Java, and the remainder of this section is concernedespecially, but not exclusively, with the solution of problems arisingwith the implementation of device drivers in Java or other interpretedlanguages.

[1080] Further details of how the problems mentioned earlier were solvedare as follows:

[1081] In a known proposal, the real interrupt level runs on a small,separate OS-supplied stack which is unsuitable for use by the Javabytecode execution engine. Preferred embodiments of the invention have anormal Java thread stack ready and waiting to be switched to when aninterrupt occurs; this is achieved by having a normal Java threadcreated as part of the Java application start-up code partially destroyitself by a call on a special native method, waitForFirstInterrupt.

[1082] In summary, an interrupt handler method can be representedgenerically as follows: waitForFirstInterrupt while (true) do {something waitForNextInterrupt }

[1083] The second line ‘while (true)’ executes an infinite loop.Upstream of the ‘waitForFirstInterrupt’ is a real Java thread with aseparate stack and real Operating System (OS) thread. The interrupthandler Java thread and its associated stack are formed in theinitiation of the system. The method then waits at ‘waitForFirstInterrupt’ until an interrupt occurs and, as far as the OS is concerned,the thread is terminated but the stack itself persists. The stack iseffectively ready and waiting to go as soon as the interrupt occurs.When the first interrupt occurs the interrupt flow of control deals withthe interrupt using the terminated thread's stack. After the interrupthas been dealt with, the interrupt handler method again lies dormant,this time at ‘waitForNextInterrupt’ until another interrupt occurs.

[1084] A more complete explanation of the sequence of events and thecorresponding pseudo code will be given with reference to FIGS. 8B and8C respectively.

[1085]FIG. 8B illustrates the sequence of events for various systemcomponents involved in handling an interrupt, while FIG. 8C is a summaryof the corresponding pseudo code of a Java interrupt handler from thedevice driver of an imaginary device which, for the sake of making aninteresting example, is taken as having two modes of operation:synchronous and asynchronous.

[1086] Initially, on powering up, the main device driver thread requestsand registers a new interrupt handler thread (FIG. 8B). In response, theentry point of the Embedded Virtual Machine (EVM) interrupt handler isregistered with the Real Time Operating System (RTOS) interrupt handlingservices via the EVM native code. The new thread is then started up andruns up to the point where it reaches the line, ‘waitForFirstInterrupt’in the pseudo code (FIG. 8C) and then terminates but without rescindingthe stack. Rather, the stack is associated with the relevant interruptand goes into a state of ‘limbo’, with no RTOS thread, and waiting to bere-activated later from the position where the Java thread had beenterminated mid-method.

[1087] At some later time the main device driver thread issues aninput/output (I/O) instruction to a device (represented in the hardwarecolumn in FIG. 8B) which will cause an interrupt to occur later andsignal to the RTOS to call the native code embodying the presentinvention. The EVM native code then switches to the dormant Java stackand does a return. From there on, the Java interrupt handler codecontinues to run but it appears to the outside world as though it wascontinuing within the original thread. Once the interrupt has been dealtwith, the interrupt handler method terminates at a call made on‘waitForInterrupt,’ control switches back to the RTOS stack and returns.The RTOS dismisses the hardware interrupt (IRET) and the interrupt isdismissed. The loop returns to the head of the ‘Issues I/O’ block in themain device driver thread column in FIG. 8B to begin the sequence againwhen a fresh instruction to the device is initiated. The loop is madeinfinite by the ‘loop forever’ feedback in the main device driver threadin FIG. 8B.

[1088] Since the interrupt handler Java method is already ‘active’ andready to execute as soon as an interrupt occurs, faster execution ispossible.

[1089] The pseudo-code shown in FIG. 8C is largely self-explanatory,once the sequence of events shown in the time line of FIG. 8B, as justdescribed, is appreciated. However, it will be noted that FIG. 8C makesspecific provision for synchronous and asynchronous modes. It has theconsequence, though, that each time an interrupt occurs the system hasto establish which mode is in operation since there is not runtime codeto indicate mode.

[1090] By the use of code such as that in 8C where there are multiplecall sites for the ‘waitForInterrupt’ method, the handler can enter intothe code of the interrupt handler method at the relevant point and thenext time there is a call it can enter in another place. That is to becontrasted with an alternative embodiment in which the method is notalready opened but is opened when there is a call as an interruptoccurs. Not only is such a method slower, since execution always beginsat the top of the code, this feature is not possible. A different loopcan be used for dealing with the synchronous/asynchronous question.

[1091] In the example of FIG. 8C, the device has two special states:asynchronous and synchronous. For example, a plotter device might havetwo modes: absolute and relative, and different handling of interruptsrequired for each mode.

[1092] The ‘waitForFirstInterrupt’ method is an important feature of theinterrupt handling mechanism of the Java VM of preferred embodiments. Itdestroys (where possible) any O/S related thread components apart fromthe stack (this stack contains information concerning just where in theJava application the particular call to ‘waitForFirstInterrupt’ was madefrom); the location of this stack is registered with the interrupthandling mechanism of the Insignia Java VM for later use with respect toa particular device or interrupt.

[1093] In summary, when the first interrupt is received from therelevant device, the operating system will enter the interrupt handlerof the interrupt handling mechanism of the Insignia Java VM—this willswitch stacks to the relevant Java stack preserved earlier and thenexecute a native method return sequence which will, as always, re-enterthe Java execution engine at the location following the native method(as recorded in the stack).

[1094] At this point, the bytecode execution engine is executing Javabytecode at the O/S interrupt level—this places various constraints uponthe bytecode execution engine whilst executing such bytecode such as notattempting any blocking thread synchronisation operation, and not doinganything that could interfere with or fail because of any phase ofgarbage collection occurring (potentially simultaneously) atnon-interrupt level. These sub-problems are covered later.

[1095] At this point it is worth noticing that this solution iscompletely compatible with the dynamic (or off-line pre-) compilationtechnology described elsewhere. It is quite possible that the bytecodebeing referenced has been compiled into machine code for speedyexecution, the native method return mechanism will select the machinecode version if present or select the interpreter forbytecode-by-bytecode interpretation.

[1096] When the Java code of the interrupt handler has (i) interactedwith the device using native methods supplied as part of the Insigniahardware access Java package as appropriate to the specifics of thedevice and interrupt type that has occurred and, (ii) interacted withthe rest of the Java application involved with the device (thenon-interrupt part of the application, that is) through the use ofnative methods in the Insignia interrupt handling package asappropriate, it must allow the execution of normal, non-interrupt codeto continue. This is achieved by calling another special native method,‘waitForInterrupt’ (as opposed to ‘waitForFirstInterrupt,’ above).

[1097] The ‘waitForInterrupt’ native method gets the Java stack readyfor a subsequent activation by another interrupt and then switches backto the O/S's small, dedicated interrupt stack and then performs thereturn appropriate to the particular O/S, allowing it to perform theactions necessary to return to non-interrupt running.

[1098] The individual problems which arise as a result of the use of theinterrupt handler in Java will now be discussed in more detail:

[1099] Firstly, as noted above, the bytecode execution engine must notattempt any potentially blocking synchronisation operations whileexecuting the bytecode of an interrupt handler.

[1100] Semaphores are used to synchronise threads. Consider, forexample, the following situation in a multi-threaded environment. Anon-interrupt thread begins an operation having acquired a semaphore.The non-interrupt thread is mid-way through the operation when aninterrupt is called. Control switches to an interrupt thread. Controldoes not then switch away from the interrupt thread until the interrupthas been handled since the interrupt is always dealt with as a priority.

[1101] If the interrupt handler needs to carry out the operation beingcarried out by the non-interrupt thread, a problem will occur since theinterrupt thread cannot enter the operation until the non-interruptthread has released the semaphore, and the non-interrupt thread cannotrun until the interrupt has been dealt with.

[1102] Thus it can be seen that blocking calls must be avoided while theinterrupt method is being executed.

[1103] It is important therefore to ensure that the normal routesthrough the bytecode execution engine have no potentially blockingsynchronisation operations—this is desirable from a performance point ofview anyway. Additionally, make sure that the nature of the bytecode ofan interrupt handler never requires other than the normal routes throughthe bytecode execution engine. Thus, we are aware of all paths that canbe used at the interrupt level, and we make sure that there are noblocking calls. Native calls to specific methods (see below) can also beused to overcome the problem of the requirement for no blocking callswhen communicating with the non-interrupt level.

[1104] In the case of Java (as opposed to a more general interpretedlanguage), this latter point means that constant pool entries must bepre-resolved (constant-pool resolution is a process that normally occursthe first time that particular bytecode is executed and can result inmany potentially blocking synchronisation and I/O operations).

[1105] In essence, neither the heap, mutexes, nor synchronisedoperations can safely be used. Java has two ways of usingsynchronisation, namely (1) by using synchronised keywords or (2) byusing methods which are themselves declared to be synchronised. The onusis on the writer of Java code to make sure there are no blocking calls.For example, when the compiler processes source files with Javaextensions (i.e., ______. Java), it generates classes. References inthose classes are followed in runtime and the relevant information iscached etc., so as not to have to repeat the process on second andsubsequent visits. As a result, the interrupt Java bytecode has to bepre-resolved, as already mentioned.

[1106] In a preferred embodiment, the Java bytecodes of interrupthandlers are pre-compiled (although this is not strictly necessary ifthe dynamic compilation system does not require the interpreter toperform any potentially blocking synchronisation operations as a matterof course).

[1107] Two types of situations may occur in which blocking calls mightordinarily be used.

[1108] In the steady state situation in which the flow of control isfollowing normal paths of execution, it is necessary for the code to bewritten so as not to contain any semaphores which could be encounteredand acquired by the interrupt handler. Thus, for such normal controlpaths, blocking calls must not be used.

[1109] A special situation includes the case in which code isencountered and executed for the first time. In such cases, operationsrequiring semaphores, for example, the loading of classes, may berequired. In such operations, semaphores are unavoidable. Thus, sincesemaphores cannot be used at interrupt level, such code is pre-resolvedso that all the necessary classes have already been loaded before thefirst interrupt is encountered.

[1110] Such pre-resolution may include the pre-compilation of the codeof the non-native interrupt handler code. The pre-resolution is carriedout at start-up, or may be effected during the building of the system.

[1111] Secondly, the bytecode execution engine must not do anything atthe interrupt level that could interfere with or fail because of anyphase of garbage collection occurring (potentially simultaneously) atnon-interrupt level. In a preferred embodiment, this is achievedbasically by denying interrupt level code the full flexibility of thegarbage collected Java heap as follows:

[1112] The special Java thread that includes an interrupt handler isallowed to allocate objects as part of its start-up phase (before itcalls the special native method, ‘waitForFirstInterrupt’); these objectswill persist for the entire life-time of the system (they will never berecycled by the garbage collecter). At the time that the Java threadceases to be normal (just becoming a stack for use at interrupt level asdescribed above), this set of heap objects becomes the set of the onlyheap objects that the interrupt-level Java code can ever see; in thisway, this set of objects is a fixed presence in the Java heap that isindependent of garbage collection activities; in this way also, thegarbage collector running at non-interrupt level can carry on inconfidence that interrupt-level Java code can never interfere with itsoperation (or vice versa) because the interrupt level code will onlyever be dealing with its own set of objects.

[1113] It is permissible for non-interrupt Java code to see referencesto interrupt Java objects. A crucial thing is that it must not use thisas an opportunity to store references to non-interrupt Java objects intothese interrupt objects for interrupt Java code to see. It is notpermissible for the interrupt Java code to see references tonon-interrupt Java objects.

[1114] Policing mechanisms can be put into place on development VMs toensure that this policy is not violated. For example, a mechanism can beput into place to prevent non-interrupt level Java from storing anythingin an interrupt level Java object.

[1115] As indicated above, it is important to separate GC from theinterrupt level. When the interrupt goes off, the GC could be anywherein the system. If, for example, the GC has acquired a semaphore, thatmay lead to problems at interrupt level as discussed above.

[1116] Furthermore, if the interrupt handler were able to alternon-interrupt objects, it might write something to an object which hadalready been marked for deletion by the GC, or might change an objectwhich would confuse the GC system.

[1117] Thus, the allocation of the interrupt level objects is made froma separate part of the memory and they are kept separate from the GC.

[1118] Furthermore, the interrupt handler is not able to seenon-interrupt objects so that it cannot try to change them. That mightcause a problem, for example, if the interrupt handler tried to changean object that the GC had been half-way through moving when theinterrupt occurred.

[1119] Thirdly, the solutions of the last two problems would seem toindicate that communication between interrupt-level Java andnon-interrupt-level Java is hard, if not impossible. In a preferredembodiment of the invention, a special mechanism is made available tothe Java application programmer to enable the passing of informationfrom the Java code which runs at interrupt level to the rest of theapplication.

[1120] Since the making of blocking calls and new objects are both to beavoided as far as possible, how does interrupt level Java codecommunicate with the rest of the application?

[1121] Normally, ‘wait’ and ‘notify’ would be used but the presentcontext would necessitate synchronisation on the Java object. However,we have previously stated that synchronisation (e.g., in the case whereobject 0=new object (0); and code is synchronised (0) for wait andnotify) is not permitted for interrupts. Therefore, we provide our ownnative methods that look like calls on Java methods but which arewritten in C or Assembler language.

[1122] Native methods are provided as part of the interrupt package toallow the passing of information from the interrupt level to thenon-interrupt level; this allows the non-interrupt code to be suspendedinside a call on the read native method and to be woken when aninterrupt has completed having made a call on the associated writemethod (on ‘SpecialChannel.write’ in FIG. 8C).

[1123] The ‘specialChannel.write’ instruction is a virtual invoke of the‘specialChannel.write’ native method. Thus, for a C operating system, aC function is called to carry out the write method. The C native methodthen sends a message to a corresponding ‘read’ native method at anon-interrupt level. The non-interrupt method may be suspended waitingat ‘read.’ Thus, the interrupt level can communicate with thenon-interrupt level of the non-native code, without any blocking callsbeing required.

[1124]FIG. 8D shows an apparatus for carrying out an embodiment of theinvention. The apparatus includes hardware 18000 (which will generateinterrupts), a Real Time Operating System (RTOS) 18002 and an associatedEVM native code device 18004. The apparatus further includes a Java maindevice driver 18006 which can issue I/O to the hardware 18000. The EVMnative code device 18004 is connected to the Java interrupt handler18008. The interaction of these components of the apparatus is describedwith relation to FIG. 8B.

[1125] The apparatus also includes a garbage collector 18010. It will beseen that the garbage collector 18010 has access to the Java main devicedriver 18006 and another Java thread 18012, but not to the Javainterrupt handler 18008, which includes objects 18016 and a heap 18014which are isolated from the garbage collector 18010. The interrupthandler 18008 also includes a stack 18018. Native calls can be made fromthe interrupt handler 18008 to the OS 18004, 18002 and on to thenon-interrupt levels 18006, 18012.

[1126] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike are performed on or using electrical and like signals.

[1127] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[1128] In any or all of the aforementioned, the invention may beembodied in any, some or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[1129] As used herein throughout the term ‘computer system’ may beinterchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’‘machine,’ and like terms. The computer system may be or may include avirtual machine.

[1130] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[1131] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[1132] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[1133] Agent's Reference No. 9—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of OperatingThat System

[1134] This invention relates to a computer system and to a method ofoperating a computer system. More particularly, the invention preferablyrelates to a computer system and a method of operating a computer systemincluding an object oriented program. In a preferred embodiment, theinvention relates to the use of a class loader to allow directinvocation of non-final instance methods.

[1135] This invention applies preferably to virtual machines (VM)executing Object Oriented (OO) programs, where the classes of theobjects in question are dynamically loaded and/or discovered by thevirtual machine. It applies particularly to Virtual Machines where someoptimisations can be performed if a potentially polymorphic Method of aclass can be safely assumed to be non-polymorphic; for example in adynamically compiling virtual machine.

[1136] The invention is concerned with Method inheritance, ‘Method’being a term in Java and other programming languages for functions suchas, for example, ‘area’ of a circle and any of the other functions suchas ‘play,’ ‘turn on lights,’ and so on, as already discussed in otherAgent's References of this specification.

[1137] In this part of the present application relating to the use of aclass loader to allow direct invocation of non-final instance Methods,the term ‘Method’ (with capital ‘M’) will be used to relate to Methodsof the programming language (also known by, but not restricted to, otherterms including ‘functions’ and ‘routines’); the term ‘method’(withlower case ‘m’) will be used in respect of the procedure of carrying outthe invention (unless it is clear otherwise from the context).

[1138] In prior systems and methods a call to a Method of a given namewill cause one of a number of different implementations of the namedMethod to be executed according to which object one is interested in(e.g., a ‘play’ function in a video recorder, tape recorder, etc.). Thisis called a ‘polymorphic call’. Under these circumstances, one wouldcompile differently according to the object. Because of these problemsone would make no assumptions about the destination Method, and so onehas to compile the call to it less than optimally.

[1139] Java is an example of a language which is heavily objectoriented; Java allows single inheritance. Other languages, for exampleC++, allow multiple inheritance. Inheritance leads to polymorphism.

[1140] In an object oriented environment, a Method is said to bepolymorphic if a number of different implementations of the Method areavailable in the system, and the implementation used is chosen at eachpoint where the Method is invoked and at each time that the point ofinvocation is reached in the execution of the program. This situationtypically comes about in object oriented systems due to inheritance,whereby a class (a description of a type of object, including theMethods that can be invoked on instances of that class) is taken as abasis for a subsequent class. This new class is termed the subclass ofthe original class, which is termed the superclass, and the subclass issaid to inherit all the aspects (including the Methods) of thesuperclass. However, the new subclass can override some or all of theMethods inherited from the superclass and provide its ownimplementations of these Methods; these overridden Methods are nowpolymorphic, and the implementation which is chosen to be used in anycase where one of these overridden Methods is invoked is governed by theclass of the object that the invocation is associated with.

[1141] For example, a single named Method may appear multiple times inthe class hierarchy and each appearance of the named Method maycorrespond to a different implementation of the Method. The actualMethod that is run will depend on the object relating to the namedMethod.

[1142] One approach to the situation where the system can be affected byclasses that are discovered and/or loaded at some time after a class orMethod or part of a Method is converted to a compiled form is to make noassumptions in the compiled version about the Method being invoked bythe dynamic mechanism. In the Java environment, Methods can be marked as“final,” which means that it is illegal to override them in subsequentsubclasses. This allows assumptions to be made about which Methodimplementation is being invoked, but the majority of Methods in typicalJava classes are not so marked for reasons of flexibility.

[1143] Whilst the approach described above will yield a system thatworks, a potentially large number of optimisation opportunities will bemissed, since the cases where a Method (if it is not polymorphic at thetime that the compilation of a call to the Method is attempted) remainsnon-polymorphic are seen in normal use to predominate. If, however, theassumption is made that the Method is not polymorphic, then the systemruns into problems if the assumption is later found to be false, due toa new class being loaded into the system.

[1144] In one aspect of the present invention, our solution to theproblem of optimising the system aims to optimise for the set ofcurrently loaded classes. In a second aspect of the present invention,if another class is loaded that overrides some Methods of previouslyloaded classes, optimisation will be changed for calls to Methods thatthe new class overrides, that is, if we discover that the Method ispolymorphic, then we go back and undo the specific optimisationassumptions.

[1145] According to the first aspect of the invention, there is provideda method for compiling code, the code including a call to a Method whichis potentially polymorphic, the method including compiling the code onthe basis of an assumption that the Method is not polymorphic.

[1146] In an object oriented program, a Method may have the potential tobe polymorphic but may in fact be non-polymorphic. By making theassumption that the Method is non-polymorphic, various optimisations maybe made in the compilation of the code, which may give, for example, areduction of the amount of memory occupied by the compiled code andfaster execution of the compiled code.

[1147] Preferably, when the assumption is made that the Method isnon-polymorphic, the Method is marked as ‘assumed final.’

[1148] Preferably, the compilation includes optimisation of the call tothe Method.

[1149] In one embodiment of the first aspect of the invention, theoptimisation includes in-lining. In particular, there is preferablyin-lining of the single implementation of the invoked Method. As isdescribed in more detail below, the Method being invoked may be moved oncompilation to be in line with the code including the invoke of theMethod so that fewer jumps between portions of code are required onexecution of the code and the cost of the frame creation is lost. Thisleads to faster execution of the compiled code.

[1150] In an alternative embodiment of the first aspect of theinvention, the optimisation includes forming a patch. Where anassumption is made that a Method is non-polymorphic, a patch may beformed between the compiled code invoking the Method and the compiledcode of the Method. The formation of patches are discussed in moredetail in Agent's Reference No. 12 of this specification.

[1151] As indicated above, where a Method is non-polymorphic at the timeof compilation of a call to that Method, it is often found that in themajority of cases the Method remains non-polymorphic. However, therewill be some occasions in which the assumption is found to be false. Forexample, a new class being loaded may include a new sub-class includinga new instance of a Method which previously had been assumed to benon-polymorphic.

[1152] Preferably, the method further includes the step of creating amarker if the assumption has been made. Preferably, the marker isassociated with the Method which has been assumed to be non-polymorphicand preferably the marker is made in the data structure of the Methodwhich has been assumed to be non-polymorphic.

[1153] By creating a marker to indicate that the assumption has beenmade that a Method is non-polymorphic, a check can be made to seewhether any assumptions have been made which should be overridden. Sucha search is preferably carried out when a new class, in particular a newsub-class, is loaded into the system.

[1154] Preferably, the code is a dominant path fragment. Preferably, thecompiler is arranged to compile only dominant path fragments of code.See the Agent's Reference No. 1 of this specification for a discussionof preferred features of the compiler and the method of compilation ofdominant path fragments. By compiling dominant path fragments of code,only those fragments which are frequently executed will be compiled,thus reducing the time and memory taken by the compilation ofinfrequently executed sections of code.

[1155] Preferably, the code is code of an object oriented language,preferably Java.

[1156] The first aspect of the invention also provides a method ofoperating a computer system, the method including a method of compilingcode as described above.

[1157] The first aspect of the invention also provides a computer systemincluding a compiler for compiling code, the compiler being arranged sothat, when compiling code including a call to a Method which ispotentially polymorphic, the compiler compiles the code on the basis ofan assumption that the Method is not polymorphic.

[1158] Preferably, the system includes means for marking the Method as‘assumed final’ if the assumption is made that the Method isnon-polymorphic. Preferably, the compilation system marks the Method as‘assumed final.’ The marking may be carried out by a compiler manager.

[1159] Preferably, the compiler includes means for optimising the callto the Method. Preferably, the compiler is arranged to in-line theMethod or to create a patch to the Method.

[1160] Preferably, the system further includes means for creating amarker if the assumption has been made. Preferably, the marker iscreated by the compilation system, preferably by the compiler manager.

[1161] Preferably, the code is code of an object oriented language,preferably Java.

[1162] According to the first aspect of the invention, there is alsoprovided a compiler for compiling code by a method described above.

[1163] The first aspect of the invention also provides code compiled bya method described above.

[1164] According to a second aspect of the invention, there is provideda method of introducing a class into a computer system, the methodincluding the step of determining whether a Method of the class haspreviously been assumed to be non-polymorphic. That determination may bemade, for example, by checking for the presence of a marker which may beprovided in accordance with the first aspect of the present invention.The determination may be made by checking the data structure of theMethod.

[1165] Thus, the second aspect of the invention finds particularapplication where the assumption has been made in the compilation ofcode, for example in accordance with the first aspect of the invention.

[1166] The introduction of the class may include the loading of a newclass into the system.

[1167] The second aspect of the invention finds particular applicationin the loading of a new class into the system where the new class beingloaded is a subclass of a class already on the system.

[1168] If a Method of the new class is found to have been assumed to benon-polymorphic, alterations are preferably carried out in respect toany optimisations made on the assumption that the Method wasnon-polymorphic.

[1169] Preferably, the method further includes adjusting the compiledcode if it is determined that the Method has been assumed to benon-polymorphic. The adjustment may be to the compiled code includingthe call to the Method.

[1170] The adjustment of the compiled code may include deletion of asection of compiled code. The deleted code may include the call to theMethod. For example, where optimisation of the compilation of the codehas included in-lining, deletion of the compiled section of the codeincluding the in-lined section may be required.

[1171] Alternatively, or in addition, the adjustment of the compiledcode may include the undoing of a patch. For example, where optimisationof the compilation of the code has included a patch pointing directlyfrom the invoke of a Method to the Method, the patch is preferablyundone. Alternatively, the patch and/or the compiled code associatedwith the patch may be deleted.

[1172] Preferably, the alteration of the compiled code is carried outatomically. This is of particular importance in a multi-threadedenvironment in which several threads are executing simultaneously. Ifthe adjustment were not carried out atomically in such an environmentand the threads were allowed to continue executing while the adjustmentswere carried out, there is a risk that a thread may be activelyexecuting in a region of compiled code at the same time as an adjustmentof that region of code is being made. That would clearly be mostdisadvantageous. If the adjustment were not carried out atomically, itwould be necessary to stop the execution of all threads which mightenter the region of code to be adjusted while the adjustments werecarried out. Preferably, checks are carried out prior to the alteration,in particular the deletion, of code to ensure that it is safe to changethe relevant section of code. Preferably, the method includes the stepof carrying out stack walking prior to alteration of the compiled code(see the Agent's Reference Nos. 6 and 12 of this specification). Stackwalking will not normally be required for undoing a patch but ispreferably carried out before deletion of a compiled version of a blockof code.

[1173] In many cases, it will be preferred that the relevant section ofcompiled code is not deleted as soon as the assumption is found to beincorrect, but that the compiled section of code may be made unavailablefor execution. It is preferred that the relevant section of compiledcode is deleted in due course to release the memory occupied by thecompiled section. Until the section is deleted, however, the compiledcode is preferably either marked that it should not be used, oradjustments are made elsewhere so that the compiled code is not used.For example, in one embodiment of an invention of Agent's Reference No.1 of this specification, the dispatch table of a Method is marked ifthere is a compiled version of any of the fragments of the Method. Thus,if the assumption is found to be incorrect, the dispatch table of theMethod can be altered so as not to refer to the compiled fragment, thusmaking it unavailable for execution.

[1174] Preferably, an interpreter is used if it is determined that aMethod of the new class has been assumed to be non-polymorphic in thecompilation of the code. Execution preferably then continues using aninterpreter. While it would be possible to wait while a fresh compiledversion is made which does not make the assumption, that is notpreferred. Where compiled code is deleted or made unavailable,preferably a fallback interpreter is used unless or until the compiledcode is replaced. Further discussion of the use of a fallbackinterpreter can be found in Agent's Reference No. 1 of thisspecification.

[1175] Preferably the method of operating a computer system of thesecond aspect of the invention also includes features of the method ofthe first aspect of the invention relating to the compilation of code.

[1176] The second aspect of the present invention also provides a methodof operating a computer system including the steps of:

[1177] compiling a call to the Method for a given class;

[1178] determining for a new sub-class whether a Method of the class haspreviously been treated as final; and

[1179] adjusting the compilation of the call to the Method for the givenclass if the Method is not final.

[1180] The method according to the previous paragraph advantageouslytakes advantage of the assumption that the Method being called is“final.”

[1181] Preferably, the adjustment of the compiled code is carried outbefore the class is introduced. For example, where a new class is beingloaded into the system, both the search for non-polymorphic Methods andany necessary adjustment to the compiled code is made before the loadingof the new class is completed.

[1182] The introduction of a new sub-class may be effected by the use ofa class loader. Preferably, the class loader calls into the compilationsystem. If assumptions have been made during the compilation of the codewhich may be overridden by a class to be loaded, preferably, thecompilation manager deals with the situation by either undoing the patchor making the compiled version of the compiled code unavailable forexecution, for example by deletion and/or changing the dispatch table ofthe Method until, for example, the deletion of the code is effected.

[1183] According to the second aspect of the invention, there isprovided a method of loading a class using a class loader into a systemincluding compiled code, in which the class loader determines whetherassumptions have been made in the compiled code which may be overriddenby the class.

[1184] If an assumption has been made which is to be overridden,preferably, the class loader calls into the manager of the compiledcode. That call may lead to the adjustment of the compiled code asindicated above.

[1185] In a further aspect, the invention provides a computer systemincluding means for compiling calls to a Method for a given class, meansfor determining whether the Method can be treated as final, and meansfor adjusting the compilation of the call to the Method for the givenclass on the basis of the determination.

[1186] The invention therefore enhances opportunities for optimisationof the computer system.

[1187] The second aspect of the invention also provides a computersystem including a means for introducing a new class, the systemincluding means for determining whether a Method of the class haspreviously been assumed to be non-polymorphic. The means for introducinga new class may be a class loader.

[1188] Preferably, the system further includes means for altering thecompiled code if it is determined that a Method of the new class hasbeen assumed to be non-polymorphic in the compilation of code. Thealteration of the compiled code is preferably carried out by thecompilation system, preferably, the compiler manager.

[1189] Preferably, the system includes means for deleting compiled code,which may include means for undoing a patch. The compiler manager mayinclude the deletion device.

[1190] Preferably, the system includes a stack walking device.

[1191] Preferably, the system further includes an interpreter.

[1192] According to the second aspect of the present invention, there isprovided, a computer system including:

[1193] a compiler for compiling a call to the Method for a given class;

[1194] means for determining for a new sub-class whether a Method of theclass has previously been treated as final; and

[1195] means for adjusting a previously compiled version of the call tothe Method for the given class if the Method is not final.

[1196] Preferably, the means for introducing the new class includes aclass loader.

[1197] The second aspect of the invention further provides a classloader for use in a computer system as described above.

[1198] In a further aspect, the invention provides a computer systemincluding means for compiling calls to a method for a given class, meansfor determining whether the method has previously been treated as final,and means for adjusting the compilation of the call to the method forthe given class if the method is not final.

[1199] Also provided by the invention is a computer-readable storagemedium having a programme recorded thereon for carrying out a methodaccording to the first and/or the second aspects of the invention asdescribed above.

[1200] The invention extends to: a computer-readable storage mediumhaving a programme recorded thereon for carrying out a method forcompiling code, the code including a call to a Method which ispotentially polymorphic, the method including compiling the code on thebasis of an assumption that the Method is not polymorphic.

[1201] The invention also extends to a computer-readable storage mediumhaving a programme recorded thereon for carrying out a method ofintroducing a class into a computer system, the method including thestep of determining whether a Method of the class has previously beenassumed to be non-polymorphic.

[1202] Further, the invention extends to a computer when programmedaccording to a method as aforesaid.

[1203] The invention also extends to a computer when programmedaccording to a method for compiling code, the code including a call to aMethod which is potentially polymorphic, the method including compilingthe code on the basis of an assumption that the Method is notpolymorphic.

[1204] Further, the invention extends to a computer when programmedaccording to a method of introducing a class into a computer system, themethod including the step of determining whether a Method of the classhas previously been assumed to be non-polymorphic.

[1205] In summary, the problems outlined above are solved by the variousaspects of this invention using a number of factors. First of these isthe ability to adjust the action of the class loader in this situationto notify the manager of the compiled code when an assumption about thefinality of a Method previously made during prior compilation is foundto be false. The second factor is the ability to, at any time, removefrom the system compiled code which is no longer wanted for whateverreason. A third factor is the use of patches to existing compiled codesequences, allowing the code action to be adjusted whilst the code is“live,” and being potentially executed by one or more threads of thevirtual machine.

[1206] Any, some, or all of the features of any aspects of the inventionmay be applied to any other aspect.

[1207] The following considerations apply to any and all the inventionsand aspects of the inventions described above.

[1208] Preferred embodiments of the invention will now be described,purely by way of example, having reference to the accompanying figuresof the drawings (which represent schematically the improvements) inwhich:

[1209]FIG. 9A shows a flow diagram illustrating a preferred embodiment;

[1210]FIG. 9B shows a section of compiled code;

[1211]FIG. 9C shows a different section of compiled code; and

[1212]FIG. 9D shows apparatus for carrying out a preferred embodiment.

[1213] Methods which are potentially polymorphic may be, in fact,non-polymorphic at a particular time. If such a Method isnon-polymorphic at the time of compilation, optimisations can be made inthe compilation of the call relating to the Method, in particular to thecode relating to the call to the Method.

[1214] As each section of code is considered by the virtual machine forcompilation, any potentially polymorphic invocations out of the codesection are also considered. For each such invocation, if thedestination of the invoke is fixed at the time of compilation (that is,there is only one implementation of that Method in the system at thattime), then the assumption is made by the compilation system that thatsituation will continue to be so. This allows various optimisations tobe made, including but not limited to the in-lining of the singleimplementation of the invoked Method.

[1215] For example, Class X defines a Method public int foo (int) whichincludes a call to a Method bar. Class X public int foo (int) { : callbar ( ) : : } public int bar ( ) { // body of bar from class X : }

[1216] The call within public int foo (int) effects a jump out of thatsection of code to bar. Once bar has been carried out, the thread willoften return to the original function, in this case by jumping back intopublic int foo (int). Thus, at least two jumps are required in theexecution of public int foo (int).

[1217] If a decision is made to produce a compiled version of thesection of code including the call to bar, various optimisations may bemade to the compilation of the call to bar if an assumption is made thatbar is final. Unless it is known that bar is not final, the assumptionthat it is final is made at compilation. Two examples of optimisation ofthe call to bar are in-lining and patching.

[1218] In the optimisation method of in-lining, the Method bar is movedto be in line with the rest of the code of public int foo (int) fromwhich bar is called. Class X public int foo (int) { : // call bar ( ) {// body of bar from class X : } : }

[1219] Optimisation is achieved because the jumps to and from bar arenot required and there is no code for frame creation and destructionrequired. The compiled code of bar is placed sequentially in the Method.This in turn exposes further opportunities for optimisation.

[1220] In patching, a direct link (patch) is made between the call tobar and the bar Method. The patch is a piece of code which effects thejump to the bar Method. By using a patch, the execution of the jump tobar can be made faster at run-time. The formation of patches isdescribed in more detail in Agent's Reference No. 12 of thisspecification.

[1221] The optimisation has been carried out using the assumption thatthe Method bar is final. Before the resulting compiled version of thecode is made available to the rest of the virtual machine as a potentialpart of the execution environment, the Method being invoked is marked inits VM data structure as having compiled code which assumes that it isnot overridden (that it is a final Method). In the example given above,the mark is made in the data structure associated with the Method barwhich indicates that if anything is done to override the Method,something must be done in view of the assumption made.

[1222] At some later time, Class Y (a sub-class of Class X) may beintroduced which includes a new version of the Method bar. Class Yextends X : public int bar ( ) { // body of bar from class Y : }

[1223] The assumption that bar is final has already been made for theexisting compiled code and adjustment of the compiled code is required.

[1224]FIG. 9A shows a flow diagram in respect of a preferred embodimentin which a class loader checks for instance Methods of a new class beingloaded.

[1225] The class loader finds a new class for loading into the system atstep 4000. The class loader does not load the class until various checkshave been carried out. At a first step 4002, the class loader carriesout standard checks to see if the class is a valid one for the system inwhich it is to be loaded. If not, an exception is thrown. If it is, theclass loader looks for instance Methods in the class to be loaded atstep 4006. In step 4008, the class loader looks at the Method to seewhether it is a Method that overrides another Method, for example,whether it overrides a Method of a parent class of the class beingloaded. If not, the class loader looks at the next Method. If the Methodis found to override a parent Method, the class loader looks at the datastructure of the parent Method at step 4012 to see if it has beenmarked. As described above, the marker would indicate that a compiledcall to a compiled version of the Method has been prepared on theassumption that the parent Method is final. If there is no marker, theclass loader proceeds to look for another Method in the new class asshown by path 4014. If there is a marker, the class loader calls intothe compilation system at step 4016 to indicate that something has to bedone about the compiled code that was compiled on the assumption thatthe parent Method was final. The class loader then proceeds to look foranother Method of the class to be loaded as shown by path 4018.

[1226] In the following, the optimisation made in the compilation of thecode with the assumption that bar was final is a patch. FIG. 9B shows asection of compiled code 4020 including a fragment of code including acall 4022 to the Method bar. In the optimisation, a patch is made suchthat the call 4022 is made to transition directly (path 4027) to thecompiled class X form of the Method bar 4028.

[1227] Where the assumption is made that the Method bar is final isfound to be incorrect, the patch is undone as follows:

[1228] The call 4022 is changed to transition (path 4023) to the generalcall 4024 to the Method. The redirection of the call 4022 is carried outatomically.

[1229] The form of the general call 4024 was prepared, as an outlier, atthe same time as the compiled Method foo 4020 was created, and thegeneral call 4024 can transition (path 4029 or path 4025) to a number ofdifferent implementations of the Method bar (4028 or 4026).

[1230] For further details of the redirection and deletion of patches,see Agent's Reference No. 12 of this specification.

[1231] In the following, the optimisation made in the compilation of thecode with the assumption that bar as final is in lining. FIG. 9C showssections of compiled code 4030 including a first section 4032, a secondsection 4034 and a third section 4036. The second section 4034 is acompiled version of code including a call to bar 4038. The Method barhas been in lined so that bar is now contained in the section 4034 assection 4038.

[1232] If it is later found that the assumption is incorrect, compiledcode section 4034 will be deleted. The dispatch table of the Methodincluding the section 4034 is altered so as not to refer to the compiledversion of the code 4034.

[1233] On subsequent execution of the compiled code 4030, the section ofcompiled code 4032 will be executed first, and at the end of section4032, control passes to glue code. The glue code looks to see whetherthere is a compiled version of the next section 4034. The compiledsection is not found and so preparations are made to transfer control tothe interpreter for further execution.

[1234] Control may be passed first to an outlier to update states. (SeeAgent's Reference No. 3 of this specification).

[1235] The glue code tells the interpreter to begin execution of thenon-compiled version of the code corresponding to section 4034.

[1236] At a later time, the compiled section 4034 will be deleted. Stackwalking will be carried out before the section is deleted. (See Agent'sReference No. 6 of this specification).

[1237] Thus, it will be seen that the patch optimisation is more easilyundone than inlining if it is subsequently found that the assumptionthat a Method is final is not correct. However, better optimisation andreduced execution time is available from the use of inlining and in manycases inlining will be preferred if it is thought that the assumptionswill be proved incorrect only infrequently.

[1238] In summary, as each new class is loaded, the class loader checksto see if any of the Methods of the new class override a Method with themarker set in its data structure. If this is the case, the class loadercalls back to the compiled code manager section of the virtual machineand requests that all the affected compiled code is deleted or madeinaccessible.

[1239] If the compiled version of the calling code is arranged not tomake many assumptions about the internal details of the Method it isinvoking, a simpler mechanism that can be used in parallel with theabove mechanism is to allow patching of the compiled version of thecalling code to call directly to the compiled version of the Methodbeing called. This direct patch can be relatively easily undone if asubsequently loaded class is found to override the Method in questionusing the same detection mechanism as described above. The benefit ofthe patched version is that it avoids the overheads of making thedynamic decision at the time of the invoke as to which implementation tochoose to invoke. Even if there is only one possibility, the overhead ispresent unless the patched form is used.

[1240]FIG. 9D shows apparatus for carrying out a preferred embodiment.The apparatus includes a compilation system 4040 including a compiler4042 and a compiler manager 4044. The compiler 4042 has optimisationdevices 4046 and 4048 for creating patches and inlining, respectively.The compiler manager 4044 includes a marking device 4050 for marking aMethod to indicate that a call to it has been compiled on the basis ofan assumption that it is final.

[1241] The apparatus further includes a class loader 4052 for loadingnew classes. The class loader has a Method checker 4054 for determiningif a Method of the class being loaded will override a Method which hasbeen compiled on the assumption that the Method is final. The Methodchecker 4054 will search for markers in the data structure of theMethods. if an overridden Method is found, the class loader 4052notifies the compiler manager 4044 which uses the alteration device 4056to make necessary alterations to the compiled code. The alterationdevice 4056 includes an isolation device 4058 to make the relevantsection of compiled code unavailable for execution. The alterationdevice 4056 further includes a patch undoing device 4060, a deletiondevice 4062 for deleting, for example, sections of unwanted compiledcode. The alteration device 4056 also includes a stack walker 4064 forallowing the compiled code safely to be deleted.

[1242] The apparatus further includes an execution device 4066 forexecuting compiled code. Glue code 4068 and outliers 4070 are providedfor effecting the transfer to execution by an interpreter 4072, whererequired. The interpreter 4072 includes an execution history recorder4074 for recording the execution of blocks of code by the interpreter.That information is used for the compilation of the dominant path (seeAgent's Reference No. 1 of this specification).

[1243] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike, are performed on or using electrical and like signals.

[1244] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[1245] In any or all of the aforementioned, the invention may beembodied in any, some, or all of the following forms: it may be embodiedin the computer system itself; it may be embodied in a computer systemwhen programmed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[1246] As used herein throughout, the term ‘computer system’ may beinterchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’‘machine,’ and like terms. The computer system may be or may include avirtual machine.

[1247] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[1248] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[1249] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[1250] Agent's Reference No. 10—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of OperatingThat System

[1251] This invention relates generally to data structure access, inparticular, but not exclusively, in a multi-threaded environment. In apreferred embodiment, the invention relates to reducing search times forunordered lists in a multi-threaded environment.

[1252] In a multi-threaded environment, extreme care must be takenwhenever shared data structures (i.e., those able to be accessed by morethan one thread at the same time) are modified. Without this care,threads may see partially updated data and thus obtain a corrupt view ofthe data structure. A frequent implementation technique is to lockaccess to the data structure with a mutually-exclusive access mechanism,a ‘mutex.’ This permits access by one thread at a time through code thatmakes the modification. The result is a very slow process that gets evenslower the more threads there are competing for access. Some datastructures have many times more accesses that read the data structurethan accesses that make modifications, and these benefit from an accessmechanism that does not use a mutex for accesses that just read thedata.

[1253] A first aspect of this invention relates in particular to amethod of accessing a particular entry in a list of entries in acomputer system, including the steps of: reading a start pointer to oneof the entries; examining the entries in the list in turn commencingwith the entry pointed to by the start pointer until the particularentry is found; and accessing the particular entry which has been found.

[1254] In a known implementation of this method, the list has a distinctstart and a distinct end; the start pointer always points to the startof the list; and each entry includes a pointer to the next entry in thelist, except the last entry which has a null for its next entry pointer.

[1255] If the particular entry to be accessed is at the end of the list,then it is necessary to examine all of the entries in the list beforethe particular entry is found. In many applications, there is an aboveaverage probability that the particular entry which has been found willbe the entry which is required the next time the list is accessed. Inthis case, with the known implementation of the method, there istherefore an above average probability that if it has been necessary toexamine all of the entries in the list before the particular entry isfound, then on the next access it will also be necessary to examine allof the entries in the list before the particular entry is found.

[1256] In a further technique, when a particular entry is found, thelist is reordered to move the entry found to the front of the list.Thus, the entry found will be the first to be looked at the next timethe list is accessed. Where the list is reordered in that way, it isnecessary to lock access to the data structure with a mutex. If twothreads tried to reorder the list at the same time, corruption of thelist is likely.

[1257] According to a first aspect of the invention, there is provided amethod of accessing a particular entry in a list of entries in acomputer system, each of the entries including a respective next entrypointer which points to an entry in the list so that the next entrypointers together form a closed loop, the method including the steps of:reading a start pointer to one of the entries; examining the entries inthe list in turn commencing with the entry pointed to by the startpointer until the particular entry is found, in which the next entrypointer for an entry being examined is read in order to determine whichof the entries to examine next; accessing the particular entry which hasbeen found; and overwriting the start pointer so as to point to theparticular entry which has been found so that in a repeat of theaforementioned steps for the same or a different particular entry, theexamining step commences with examining the first-mentioned particularentry.

[1258] By overwriting the start pointer so as to point to the particularentry which has been found, so that in a repeat of the aforementionedsteps for the same or a different particular entry, the examining stepcommences with examining the first-mentioned particular entry, advantageis therefore taken of the fact that, in many applications, there is anabove average probability that the particular entry which has been foundwill be the entry which is required the next time the list is accessed,in order to make accessing quicker and more efficient.

[1259] According to the invention, the entries each include a respectivenext entry pointer which points to an entry in the list; and in theexamining step, the next entry pointer for such an entry being examinedis read in order to determine which of the entries to examine next.Accordingly, the list can be thought of as being an endless loop, ratherthan a list with a distinct start and with a distinct end with a nullnext entry pointer, as in the known implementation described above.

[1260] In the case in which the list has only one entry, the entry'snext entry pointer will point to itself. Usually, however, the nextentry pointer will point to a different one of the entries in the list.

[1261] In order to prevent the method endlessly looping, in the casewhere the particular entry is not found during the examining step,preferably the examining step is terminated once each of the entries hasbeen examined once, and the accessing and overwriting steps are omitted.

[1262] A second aspect of this invention provides a method of operationof a computer system, including the steps of executing a plurality ofthreads, each thread performing a respective accessing method accordingto the first aspect of the invention in respect of a common such list ofentries, each accessing method reading a common such start pointer inrespect of the list of entries.

[1263] In the methods of the first and second aspects of the invention,the step of overwriting the start pointer is preferably atomic, whethernaturally or by special design. This is of particular benefit in amulti-threaded environment. Where the pointer position is able to bechanged atomically, the risk of data corruption when two threads attemptto change the pointer position at the same time is reduced. Thus it ismade possible to allow the change of the pointer position without theprotection of a mutex. Also, the step of accessing the particular entryis preferably a read accessing step.

[1264] A third aspect of the invention provides a method of forming alist of entries in a computer system, including the steps of:

[1265] providing each entry with a next entry pointer;

[1266] arranging the next entry pointers to form a closed loop of entrypointers;

[1267] providing a start pointer for pointing to an entry, the pointerbeing able to be overwritten to point to a different entry.

[1268] A fourth aspect of the invention provides a method of operating acomputer system including a method of forming a list of entriesaccording to the third aspect of the invention and a method of accessingan entry according to the first aspect of the invention.

[1269] A fifth aspect of the present invention provides a computersystem which is programmed to perform the method of the first or secondaspect of the invention.

[1270] A sixth aspect of the present invention provides a computersystem including: means for storing a list of entries; means for storinga start pointer to one of the entries; means for reading the startpointer; means for examining the entries in the list in turn commencingwith the entry pointed to by the start pointer until a particular entryis found; and means for accessing the particular entry which has beenfound; characterised by: means for overwriting the start pointer so asto point to the particular entry which has been found.

[1271] A seventh aspect of the present invention provides a computermemory in which are stored a list of entries and a start pointer to oneof those entries, each entry including a respective next entry pointer,wherein all of the next entry pointers point to an entry in the list.

[1272] Preferably the next entry pointers together form a closed loop.

[1273] An eighth aspect of the present invention provides a computersystem including: a memory according to the seventh aspect of theinvention; and a processor programmed to: read the start pointer;examine the entries in the list in turn commencing with the entrypointed to by the start pointer until a particular entry is found; andaccess the particular entry which has been found; characterised furtherin that: the processor is programmed to rewrite the start pointer so asto point to the particular entry which has been found.

[1274] A ninth aspect of the present invention provides a method ofaccessing data in a list in a computer system, including the steps of:arranging the list in the form of a loop; accessing a given element inthe loop; and selecting that element as being the start of the loop forthe next access.

[1275] A tenth aspect of the present invention provides a computersystem for accessing data in a list, including: means for arranging thedata in the form of a closed loop; means for accessing a given elementin the loop; and means for selecting that element as the start of theloop for the next access.

[1276] In the above aspects of the invention, at least some of theentries or elements preferably each include a respective segment (orchunk) of compiled code, and/or at least some of the entries or elementspreferably each include a respective key.

[1277] An eleventh aspect of the present invention provides acomputer-readable storage medium having a computer program recordedthereon executable to cause a computer system to perform any of themethod aspects of this invention, or to operate in accordance with anyof the system aspects of this invention.

[1278] The principal advantages of at least some embodiments of theinvention are a reduction in access time to the data in the list and theavoidance of the need for a mutually-exclusive access mechanism,otherwise known as a mutex.

[1279] The method is particularly advantageous in a multi-threadedenvironment. The selection is advantageously performed as a single writeoperation, i.e., it is atomic. This would be of great advantage in amulti-threaded environment if stability were to be maintained.

[1280] This invention, or at least specific embodiments of it, providesan optimisation in the accessing of unordered, singly linked lists thatcan be read without a mutex. It does not address the problem ofinserting new entries into such a list, nor the more difficult problemof removing old entries, but neither does it increase the complexity ofeither task. Where modifications of this type are required, some sort ofvalve mechanism would preferably be provided. Similarly, if the list isan ordered list, the invention is not normally applicable.

[1281] Any, some, or all of the features of any aspect of the inventionmay be applied to any other aspect.

[1282] Preferred features of the present invention are now described,purely by way of example, with reference to the accompanying drawings,in which:

[1283]FIG. 10A shows a link list;

[1284]FIG. 10B shows a looped link list;

[1285]FIG. 10C shows the movement of a pointer in a looped link list;and

[1286]FIG. 10D illustrates a preferred embodiment of apparatus.

[1287] The data structure of FIG. 10A includes lists of entries 26110,26112, . . . 26106, and a pointer list_head 26104 to the start of thelist. Each of the entries includes a pointer next_ptr to the next entryin the list, a key and data. In the last entry 26106 in the list, thevalue of the next entry pointer next_ptr is the NULL value.

[1288] If the data structure illustrated schematically in FIG. 10A isaccessed for reading very frequently, and if the access mechanismwithout any mutex is efficient, then the time taken to acquire andrelease the mutex may become a significant proportion of the time toaccess the data structure.

[1289] In the FIG. 10A example, the thread would normally enter the listat the first element 26110 via the list_head pointer 26104 and movesequentially through the others in the list. If the list_head pointer26104 were to be moved to, for example, the second element 26112, then athread entering the list at that point would not ‘see’ all of theelements in the list. To overcome that difficulty, the list would haveto be re-ordered so that all the elements could be seen, but then amutex would have to be provided.

[1290] With reference especially to FIG. 10B, in the embodiment of theinvention, by making the list into a loop by the addition of a next_ptrpointer in the entry 26106 at what was the end of the list, any threadcan independently change the list_head pointer 26104 to the start of theloop to indicate the most likely element to be accessed next time. Onthe subsequent access the item to be searched for has become more likelyto be the first item looked at. This is because every time a threadfinds an element it is looking for, it rewrites the list_head pointer26104 so that, at the next access to the loop, the next thread will bedirected to the last element that was accessed, as shown by the brokenlines 26312 in FIG. 10C, the assumption being that it is the most likelyto be needed again.

[1291] With this embodiment of the invention, the thread will access theloop at the last point accessed and will go round the loop until itfinds the element it requires. It is immaterial if more than one threadis doing this at the same time and each thread will rewrite thelist_head pointer in an attempt to cut down on access time. If twothreads try to change the list_head pointer at the same time, the orderin which the change occurs does not matter as long as each change isatomic. Quite frequently the change is naturally atomic but, if not, itcan readily be ensured to be so. It is much cheaper in computing termsto change the pointer (atomically) than it is to provide mutexes.

[1292] If a thread wants to add or delete, a mutex is imposed to preventanother thread attempting to do the same thing at the same time.However, a read thread will not be impeded since mutexes do not apply toread only threads. Modifications will appear atomic to the read threadsbut it is not possible to change them atomically.

[1293] As mentioned above, in FIG. 10A, the terminating entry of thetraditional list is designated by a null pointer at node 26106. Thelist_head pointer 26104 points at the first node, 26110. The embodimentof the invention replaces the null pointer at node 26106 with a next_ptrpointer 26202 (FIG. 10B) to the start of the list 26110. This creates acyclic loop rather than the more traditional list. By implementing thedata structure as a loop we have created the property that the listeffectively has no natural starting node. Whichever node we choose canbe treated as a head-of-list, processing being achieved by visiting allnodes until the start point is again reached. So whereas we wouldprocess a traditional list with: ptr = list_head; while (ptr != NULL) doif (ptr->key = = key) then return ptr-data endif ptr->next endwhile

[1294] the same effect is achieved in an embodiment of the inventionwhen processing a loop by the algorithm: ptr = list_head; first_ptr =ptr; if (ptr != NULL) then do if (ptr->key = = key) then returnptr->data endif ptr->next while (ptr !=first_ptr) do endif

[1295] The benefit of the embodiment of the invention is achieved byallowing the read access to re-write the list_head without mutex, atstep 15. ptr = list_head; first_ptr = ptr; if(ptr != NULL)then doif(ptr->key= =key)then list_head=ptr return ptr->data endif ptr->nextwhile (ptr !=first_ptr) do endif

[1296] Since any node within the loop can equally validly be treated asthe head of the list, provided a thread can atomically update thelist_head, no mutex is required. That is, if two threads update thelist_head at almost the same time it does not matter which threadatomically writes first, the data structure always remains consistent.

[1297] In the above process, first_ptr is set equal to ptr at step 11,which in turn has been set equal to list_head at step 10, and the testin step 19 is made with respect to first_ptr, rather than list_head, sothat a different thread can change list head in the meantime without itpreventing the loop between steps 13 and 19 possibly testing the key ofall of the entries in the loop.

[1298] In any environment where the list is unordered but there is anabove average probability that the last item found in a search of thelist will also be asked for the next time the list is searched, then bychanging the list_head as described above we reduce the number of nodesvisited in the search, and hence the search time.

[1299] As will be seen, the invention is particularly effective andsimple and is cheap to implement. In addition, the invention does notcomplicate add/delete procedures and can be effected without the needfor mutexes.

[1300] One example of such an environment is in a virtual machine wherea hash-table with chains is used to map between bytecodes in the sourceJava and any equivalent compiled host code.

[1301] It is not unusual for there to be many nodes in an unorderedlist, for example up to about 10,000 or even more. It would beimpractical to form a single chain with such a large number since thesearch time through a single list of such a size would be inordinatelylong. It is practice, therefore, to create separate chains, each with amanageable number of nodes or elements. The computer system would thenrequire some kind of addressing device or software to lead a visitingthread into the correct chain.

[1302] Where there are only two chains, to chose an elementary example,a simple test on the key would suffice. This test may involve pointingto one chain of buckets if the key is even and a different chain if thekey is odd. This system can work satisfactorily where there arecomparatively few buckets per chain. However, the norm is for there tobe tens or hundreds of buckets per chain and in the situation wherethere may be in the region of 10,000 buckets, there will be a sizeablenumber of chains to manage. This situation may be best handled by theuse of a look-up (preferably a hash) table. Again, a simple test on thekey, such as division by a prime number, can be used to separate andidentify one chain from another. It is also preferable for there to beabout the same number of entries allocated to each such chain. The hashalgorithm will then need to be chosen appropriately. An executivedecision is normally necessary as to how broadly to define the hash.

[1303] Referring to FIG. 10D, a computer system 26100 for performing themethods described above includes a memory 26102 for storing thelist-head pointer 26104 and the list of entries 26110, 26112 . . .26106, a processor 26107 for accessing the memory 26102 and performingthe methods, and a storage medium 26108 bearing a program readable bythe processor 26107 for programming the processor 26107 to perform themethods.

[1304] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike, are performed on or using electrical and like signals.

[1305] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[1306] In any or all of the aforementioned, the invention may beembodied in any, some, or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[1307] As used herein throughout the term ‘computer system’ may beinterchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’‘machine,’ and like terms. The computer system may be or may include avirtual machine.

[1308] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[1309] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[1310] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[1311] Agent's Reference No. 11—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of OperatingThat System

[1312] This invention relates to the testing of computer code which is aparticular implementation of a particular specification. In a preferredembodiment, the invention relates to a method for automatic testing andverification of dynamically compiled code in a virtual machine.

[1313] Errors in dynamically compiled code frequently manifestthemselves a long time after the error actually occurred, making itdifficult to identify the true cause. An error may appear benign when itoccurs (for example an incorrect calculation which is not immediatelyused), but its effects may be disastrous at some future time or event(for example, when the incorrect value is used).

[1314] When changing and/or adding optimisations to a dynamic compiler,it is difficult to demonstrate that the code produced as a result iscorrect. The invention is therefore concerned with testing for sucherrors.

[1315] In one known technique, testing as such was not conducted in aforward-looking sense. Instead, when an error was noted, the processwould be investigated backwards to locate the origin of the error. Thistechnique was clearly open to the risk of potentially disastrous errorsoccurring unnoticed until too late.

[1316] In another known technique which is an improvement over theprevious one just mentioned, two execution engines are used within thesame process and their results are compared. One execution engine is thetrusted implementation (the master) and the other is the implementationunder test (the slave). This test process is limited to asingly-threaded application and can be both cumbersome andtime-consuming, since the execution engines must be run in series. Theprocess is to save the initial state (state 1), run part of the master,save the final state of the master (state 2), restore state 1, run partof the slave, then check the final state of the slave against the savedstate 2 to detect discrepancies.

[1317] The testing technique implemented in Softwindows (by Insignia)was of such a type as just outlined. While effective for its purpose itwould be fair to say that it was limited in that it was only applicableto single threaded environments and, when applied to a CPU emulator, hadan executable that was simply enormous. The executables for the masterand slave were in the same executable so testing had to be done inseries. Moreover, the testing technique could itself introduce bugs anddissimilarities between master and slave. The points at whichcomparisons of state would have been carried out were largely only attransfers of control.

[1318] Techniques for identifying the cause of errors once identifiedtend to perturb the system under test, often to the extent of changingor removing (temporarily) the failure behaviour. The object of theinvention is therefore to provide a quicker and more reliable system andmethod for testing pieces of executable code, preferably executable codeproduced by a dynamic compiler.

[1319] A first aspect of the present invention provides a method oftesting a first piece of computer code which is an implementation of aparticular specification against a second piece of computer code whichis a different implementation of the same specification, including thesteps of: defining corresponding synchronisation points in both piecesof code; executing both pieces of code; and comparing the statesproduced by both pieces of code at the synchronisation points.

[1320] In many cases, the first piece of code can be a trustedimplementation of the specification (a ‘master’), whilst the secondpiece of code can be an implementation under test (a ‘slave’).

[1321] If a discrepancy is found in the states produced, then it willindicate that since the previous synchronisation point the behaviourcaused by the two pieces of code has differed. The code which has beenexecuted by the slave since the last synchronisation point can easily beidentified.

[1322] If a discrepancy is found, it indicates that one (or possiblyboth) pieces of code contains an error. The error is generally found inthe slave if only because it is likely to be newer, more complex, andless tested than the trusted master, but nevertheless this method mayidentify an error in the trusted master provided that the slave iseither correct or at least differently incorrect.

[1323] Preferably, the first and second pieces of code are executed byfirst and second different executables, respectively, e.g., a machine ormachines having separate address systems and separate stacks.

[1324] This aspect of the invention is particularly applicable when thefirst and second pieces of code are executed by first and seconddifferent virtual machines, respectively, thus increasing efficiency.The virtual machines need not necessary employ the same architecturesand/or operating systems. The system may operate independent processesand may optionally e concurrent.

[1325] In the case where the first and second pieces of code eachinclude native methods or functions, at least one such native method orfunction required by the second piece of code may be executed by thefirst executable (e.g., the master) and the result thereof beingreturned to the second executable. In this case, the method preferablyfurther includes the step of providing from the first executable to thesecond executable a list of such native methods or functions which areto be executed by the first executable.

[1326] In the comparing step for each synchronisation point in the firstpiece of code, the first executable (preferably the master) checks thestate of the second executable at the corresponding synchronisationpoint in the second piece of code. For each synchronisation point in thesecond piece of code, the second executable (preferably the slave) savesthe values of at least any of its state elements which are notup-to-date, updates the values of those state elements, transfers thevalues of its state elements to the first executable, and then restoresthe saved values of the updated state elements.

[1327] For increased efficiency, the first and second pieces of code arepreferably executed in parallel.

[1328] This aspect of the invention is particularly applicable to piecesof code which are dynamically compiled.

[1329] The synchronisation points are preferably selected from:conditional transfers of control; method/function/procedure calls orreturns; and backward transfers of control.

[1330] In the case where the first and second pieces of code each haveplural threads of execution, a correspondence is preferably identifiedbetween corresponding threads produced by the first and second pieces ofcode, and in this case such corresponding synchronisation points arepreferably defined in such corresponding threads.

[1331] Preferably, the programming language is Java and synchronisationis effected on a per thread basis. More especially, in that case thereare preferably a plurality of asynchronously handled thread pairs.

[1332] Also, a correspondence is preferably identified betweencorresponding objects dynamically allocated by the first and secondpieces of code.

[1333] A second aspect of this invention provides a computer systemprogrammed to perform the method of the first aspect of the invention.

[1334] A third aspect of this invention provides a computer system fortesting a first piece of computer code which is an implementation of aparticular specification against a second piece of computer code whichis a different implementation of the same specification, wherein:corresponding synchronisation points are defined in both pieces of code;and the system includes: means for executing both pieces of code; andmeans for comparing the states produced by both pieces of code at thesynchronisation points.

[1335] A fourth aspect of this invention provides a computer system fortesting a first piece of computer code which is an implementation of aparticular specification against a second piece of computer code whichis a different implementation of the same specification, wherein:corresponding synchronisation points are defined in both pieces of code;and the system includes: a first executable for executing the firstpiece of code; and a second executable for executing the second piece ofcode; the first executable also being operable to compare the statesproduced by both pieces of code at the synchronisation points.

[1336] In the fourth aspect of this invention, the first and secondexecutables are preferably provided by first and second differentvirtual machines, respectively.

[1337] The systems according to the third or fourth aspects of theinvention are preferably programmed to perform the method of the firstaspect of this invention.

[1338] A fifth aspect of this invention provides a computer storagemedium, or computer storage media, having recorded thereon a first pieceof computer code which is an implementation of a particularspecification and a second piece of computer code which is a differentimplementation of the same specification, wherein correspondingsynchronisation points are defined in both pieces of code.

[1339] A sixth aspect of this invention provides a computer storagemedium, or computer storage media, having recorded thereon a program tocause a computer system to perform the method of the first aspect of theinvention or to operate in accordance with any of the second to fourthaspects of this invention.

[1340] Particularly where the specification is of an execution enginefor Java bytecode, the two implementations are advantageously built intodifferent virtual machines (VMs). The VM containing the trustedimplementation is called the Master VM, and the VM containing theimplementation under test is called the Slave VM. Both VMs execute thesame application and communicate with each other at knownsynchronisation points to exchange and compare the states of the virtualmachines.

[1341] Advantageously in the above systems and methods, thesynchronisation points may be chosen (at least) in (partial) dependenceupon (and preferably in proportion to) the length of code. This givesthe dynamic compiler the best chance of performing the sameoptimisations as when not under test and hence reduces perturbation.

[1342] In a specific embodiment of the invention, the slave VM undergoesminimal perturbation, reducing the possibility of changing the failurebehaviour. Also, the state acted on by each implementation isindependent of the state acted on by the other. Furthermore, the SlaveVM requires few extra resources for this invention, increasing itsapplicability.

[1343] In the embodiment of the invention, the onus on the untestedimplementation in the slave VM is reduced. As will become apparent, theonus on the untested implementation will be simply to transmit to thetrusted implementation the final states at synchronisation points, alsoto be described later. Rather than having to play an active role, theuntested implementation is effectively passive and passes to the trustedimplementation only data as requested by the trusted implementation.Both implementations will start at the same initial states so thesynchronisation points will be predictable. Moreover, the trustedimplementation will normally be run on a powerful target machine, sothat the Master VM can be heavily instrumented, whereas the testimplementation could be run on a smaller, perhaps a hand-held, targetmachine. It is not normally necessary to port the Master VM to thetarget machine on which the Slave VM is to be run.

[1344] The invention also provides a method of testing oneimplementation of a particular specification against a differentimplementation of the same specification, including the steps of.

[1345] defining corresponding synchronisation points in bothimplementations; executing the one implementation and the similarimplementation; and comparing the states produced by both pieces of codeat the synchronisation points.

[1346] The invention also provides a computer system for testing oneimplementation of a particular specification against a differentimplementation of the same specification, including means for definingcorresponding synchronisation points in both implementations, means forexecuting implementations, and means for comparing the states producedby both implementations at the synchronisation points.

[1347] Any, some, or all of the features of any aspect of the inventionmay be applied to any other aspect.

[1348] Preferred features of the present invention are now described,purely by way of example, with reference to the accompanying drawings,in which:

[1349]FIG. 11A shows schematically the code buffer configuration of anembodiment; and

[1350]FIG. 11B shows schematically code fragments of an embodiment.

[1351] While this method has been developed primarily for a Java virtualmachine, the techniques used are more generally applicable. Referencewill be made to FIGS. 11A and 11B which respectively illustrateschematically the code buffer configuration and code fragments in theimplementation of the present testing technique.

[1352] Choice of Synchronisation Points

[1353] Both VMs must use the same synchronisation points. A suitablechoice could contain all or some of the following: conditional transfersof control; method/function/procedure calls; method/function/procedurereturns; and backward transfers of control.

[1354] The choice of synchronisation points is discussed further in thesection “The Slave Virtual Machine” below.

[1355] If the virtual machine supports dynamically allocated objects,then the Master and Slave VMs must ensure that corresponding objects areidentified on each VM.

[1356] If the virtual machine supports multiple threads, then the Masterand Slave VMs must ensure that corresponding threads are identified oneach VM and that each thread is independently synchronised.

[1357] If the virtual machine supports native methods or functions(i.e., those which are executed directly rather than via the virtualmachine's execution engine), then most have to be executed solely on theMaster and the return values and any necessary side-effects must betransmitted to the Slave. For example, a native function which returnsthe time of day would always be executed on the Master. This is becauseit would be a rare event indeed if clocks running on two differentmachines (VMs in the present context) were exactly in synchronism and itwould be a pointless and expensive exercise to cater for suchdiscrepancies in sophisticated testing techniques. On the other hand, anative function which causes the virtual machine to exit should beexecuted on both Master and Slave. Spurious synchronisation errors couldarise without these elementary precautions being put in place. TheMaster would generally contain a list of those functions which only itcan do and it would inform the Slave whether the Slave was permitted torun that function or, if not, what it needs to do otherwise.

[1358] In the case of a Java virtual machine, a native method may effectan invocation on a method written in Java. Regardless of whether thenative method itself is being executed on both VMs or solely on theMaster, such a Java method must be executed on both VMs.

[1359] The Master Virtual Machine

[1360] The Master (trusted) virtual machine is heavily instrumented torecord all reads of the virtual machine state and all modifications ofthe virtual machine state.

[1361] Each execution thread synchronises independently with thecorresponding execution thread on the Slave VM. The basicsynchronisation loop is shown under the heading Per-threadsynchronisation loop below.

[1362] Per-Thread Synchronisation Loop MASTER VM SLAVE VM MasterStart:SlaveStart: (wait for SB message) clear state info database run to nextsync point, gathering info on state reads and writes send SB message toSlave (wait for SA message) instantiate before values run to next syncpoint send SA message to Master check values against SB message gotoMasterStart

[1363] The Master starts its synchronisation loop by clearing itsdatabase of state information. It then runs to the next synchronisationpoint, adding to its state information database when any item of thevirtual machine state is read or written. The item's type and value atany read, and before and after any write, are saved.

[1364] At the synchronisation point, the Master sends a State Before(SB) message to the slave and waits until it receives the correspondingState After (SA) message from the Slave once the Slave has reached thecorresponding synchronisation point. When the Master receives the SAmessage from the Slave, it checks that all the virtual machine stateitems written by the Slave since the previous synchronisation point havethe correct type and value. If any item is incorrect then the error canbe communicated to the user immediately or batched for laterexamination. The Master can then proceed with the next iteration of thesynchronisation loop.

[1365] An optimisation to the Master loop would be to have it continuewith its next synchronisation loop immediately after sending the SBmessage rather than waiting for the SA message from the Slave. That waitcan be postponed until the Master is ready to send its next SB message,in the expectation that the wait would be very much reduced, possibly tozero. A further optimisation would be for the Master to retain a bufferof several SB messages so that it could run several synchronisationloops before having to wait for the Slave. These optimisations may beworthwhile since the Master synchronisation loop is likely to be slowerthan the Slave. The Master execution engine is typically a much slowerimplementation than the Slave execution engine and in addition isburdened with the majority of the costs of this invention.

[1366] In many embodiments, the Master will use an interpreter for theexecution of the code. A dynamic compiler can then be tested on theSlave VM. Alternatively, both Master and Slave can run compiled versionsof the code, or may both interpret code, for example, to test a newinterpreter on the Slave VM.

[1367] The Slave Virtual Machine

[1368] The Slave virtual machine (the VM under test) must keep itsvirtual machine state either up to date or easily updateable atsynchronisation points, so that the types and values of state itemswritten since the previous synchronisation point can be collected andsent to the Master. It is very important that this requirement isimplemented in such a way as to minimise any perturbation to the Slave'susual mode of operation. When the Slave contains an optimising dynamiccompiler it is particularly important not to generate different codewhen testing compared to that produced in normal operation.

[1369] This can be achieved by a combination of synchronisation pointscarefully chosen to coincide with times when the compiled code is likelyto have the necessary state available if not in the correct place, andhaving the dynamic compiler generate a special piece of code atsynchronisation points to save the contents of any state items not yetup-to-date, update them, create and send the SA message, and finallyrestore the saved contents of those state items especially updated forthe synchronisation point.

[1370] The preferred choices for synchronisation points have alreadybeen mentioned. However, it should further be mentioned that not everytransfer of control need necessarily be chosen as a synchronisationpoint. It is also possible to use every point of bytecode, but the riskof perturbation will be increased. The important feature in choosingsynchronisation points is that they must be points where the currentstates can either be identified easily, for example, where all elementsare in their home state, or can readily be put there. It is not normallypossible to choose points within a section of an execution since theorder of elements within a section may be altered as a consequence ofthat execution and there will not be a common point of reference for theslave and master implementations. Equally, synchronisation points shouldnot be chosen too far apart since the chunk of code between them couldpossibly be too large for efficient investigation should an error haveoccurred in that chunk.

[1371] For these reasons, it is preferable that, at synchronisationpoints, the execution goes out to a separate piece of code, termed a‘piglier,’ whose function is to update any necessary states. Oncesynchronisation and the necessary transfer of data is complete, thepiglier undoes the updating and returns to the compiled version. At thisstate it is important that bugs are not imported into or removed fromthe compiled version.

[1372] A typical code buffer configuration is shown in FIG. 11A in whichthe left side of the drawing shows a generalised schematic whilst theright side illustrates the code buffer contents involved around asynchronisation point.

[1373] Fragments 7100 are generated at one end of a code buffer 7102 andoutliers (‘pigliers’) 7104 at the other end.

[1374] At code generation time, the compiler lays down fragments ofcompiled code as normal until it detects a synchronisation point. Thecompiler saves its state at that point (i.e., ‘Dynamic compiler state A’in FIG. 11A) then lays down the piglier 7106 itself and the jump to it(i.e., ‘JMP piglier’). The code laid down for the piglier 7106 consistsof code to save off the current contents of any VM state elements thatare not up-to-date but need to be for this sync point; code to updatethose state elements; a call to the function to send the SA message;code to restore the previous contents of the VM state elements; and anycode necessary to restore the saved compiler state (‘Dynamic compilerState A’). For example, if the fragments of compiled code before thesync point had a particular value in a given register and the pigliercode had changed the value in the register, then some code would be laiddown to restore the original value of that register. The final code laiddown in the piglier 7106 is a jump back to the fragment of compiled codefollowing the ‘JMP piglier’ instruction.

[1375] The same process, but this time expressed in terms of fragments,is illustrated in FIG. 11B in the situation where there is no piggingand where there is pigging. As can be seen from this schematicrepresentation, the code sections on either side of the synchronisationpoint (SP) are designated B and C. In the ‘no pigging’ case, the stateof the dynamic compiler at SP during code generation time is termed‘Dynamic compiler state R’. In the ‘pigging’ case, the dynamic compilermust generate code such that the code sections B and C are identical tothe code sections B and C respectively generated in the ‘no pigging’case, and hence the state of the dynamic compiler both before and aftergenerating the piglier code in the ‘pigging’ case is identical to‘Dynamic compiler state R’ at the sync point SP in the ‘no pigging’case. This ensures that when the generated code is executed, theexecution of the piglier is essentially transparent and has noside-effects in the generated fragment code.

[1376] While it is possible for the piglier to be implemented in line,it is not the preferred option since it is unhelpful when trying todebug, and it makes it more difficult to identify and check the sectionof code between synchronisation points where the error occurred.

[1377] It is generally the case that the more processing that is done inthe piglier, the more difficult it is to restore states. Also, the morefrequent the synchronisation points, the more difficult it is to run thesame code without turning off optimisations where there is the greaterlikelihood of errors occurring.

[1378] The preferred choices for synchronisation points are theconditional transfers of control, both back and forward, and optionallyalso at invoke points. It is not the preference for function/methodreturns.

[1379] Multi-Threading Issues

[1380] If the virtual machine is multi-threaded, then the Master andSlave VMs will synchronise each execution thread separately. They musthave a method of identifying corresponding execution threads on both VMsand exchanging messages at critical points such as thread and monitorstate changes and creation.

[1381] With regards to synchronisation, there is a given startingthread, so the start conditions on the master and slave will be known.The behaviour of a thread in creating another thread is predictable asis the order of thread creation/shut down. It is therefore possible forthe exchange of messages between master and slave to take place atthread start up points.

[1382] When the master thread A creates a thread B, that information iscommunicated to the slave so that the next thread which thecorresponding thread A in the slave creates will (or should) also be B.The master (and slave) create a table containing the master threadidentity (e.g., ‘thread B i.d. is 5’) and the slave thread identity(e.g., ‘my thread B i.d. is 5’) which can then be used to exchangemessages. The same principle may be used for created objects. The SA andSB messages sent between master and slave must contain the id of thesending thread.

[1383] The Communication Mechanism

[1384] The communication required for this method can be implemented ontop of any suitable transport mechanism, e.g., sockets or named pipes.

[1385] It is preferable that the VM used for the trusted implementationis a specially built VM to support pigging. Although that VM may be usedotherwise than for pigging, it will generally be slow. It is preferablethat the VM used for the implementation under test is a specially builtVM to support pigging and pigliers.

[1386] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike, are performed on or using electrical and like signals.

[1387] Features which relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[1388] In any or all of the aforementioned, the invention may beembodied in any, some or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[1389] As used herein throughout the term ‘computer system’ may beinterchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’‘machine,’ and like terms. The computer system may be, or may include, avirtual machine.

[1390] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[1391] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[1392] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[1393] Agent's Reference No. 12—Computer System, Computer-ReadableStorage Medium and Method of Operating Same, and Method of OperatingThat System

[1394] The present invention relates in one aspect to a method ofcreating a link from a first piece of compiled code to a second piece ofcompiled code, and to a method of compiling code. It relates in anotheraspect to methods of and apparatus for examining memory in a computersystem to allow a section of compiled code to be deleted, and to amethod of and apparatus for deleting compiled code in a computer system,in particular where there may be a link between sections of compiledcode. The invention has particular (but not exclusive) application to aself-modifying multi-threaded environment. In a preferred embodiment,the invention relates to multi-threaded fragment patching.

[1395] A self-modifying environment may be one in which sections ofcompiled code are created and deleted dynamically during execution. Suchan environment is described in Agent's Reference No. 1 of thisspecification. A multi-threaded environment is one in which severalprocesses, or threads, operate asynchronously in the same workspace.

[1396] In a self-modifying environment there may be situations in whicha link must be made between a first section of compiled code and asecond section of compiled code that is located elsewhere in theworkspace, to enable execution to transfer between the two sections ofcode. The process of transferring execution from one piece of code tothe other generally involves a number of steps, including putting theaddress of the first piece of code on the stack, together with registervalues, transferring execution to an intermediate piece of code thatidentifies the location of the second piece of code, and thentransferring execution to the second piece of code. A problem withtransferring execution in this way is that a relatively large amount oftime is spent in making the transfer.

[1397] In a first aspect of the present invention there is provided amethod of providing a link between two pieces of compiled code in aself-modifying multi-threaded computer system, including inserting apatch from one piece of compiled code to the other.

[1398] By providing patches from one piece of compiled code to another,execution may transfer more quickly than if the patches were not made.

[1399] The step of inserting a patch may include changing a controltransfer instruction within the compiled code. The control transferinstruction may be any instruction which causes execution to transfer toanother address, such as a jump instruction or a call instruction. Thecontrol transfer instruction may be changed to point to the address ofthe piece of code to which a patch is made.

[1400] The step of changing a control transfer instruction may becarried out atomically. By atomically it is preferably meant that theother threads cannot view the area being changed in a partially changedform. This may be achieved for a single processor system by ensuringthat the step of inserting a patch is carried out as a single writeoperation. Alternatively, some processors provide one or more specialinstructions or sequences of special instructions which are defined toensure atomicity; such instructions may be used to ensure atomicity insingle processor and multi-processor systems. In this way it can beensured that patch manipulation is completed before any other operationswhich may influence the work space are carried out.

[1401] The first aspect of the invention also provides an apparatus forproviding a link between two pieces of compiled code in a self-modifyingmulti-threaded computer system, including means for inserting a patchfrom one piece of compiled code to the other.

[1402] The first aspect of the invention also provides a method ofcompiling code, the code including two possible paths of execution, themethod including compiling the code corresponding to one of the paths ofexecution, and including in the compiled code a control transferinstruction which is capable of being changed atomically to point to theaddress of another piece of code.

[1403] In this way, the compiled code can be arranged so that a patch toanother piece of code can be made after the code has been compiled, inparticular, to enable the other path to be executed.

[1404] Preferably, the control transfer instruction is of a type whichcan point to an address which is further from its own address than ifthe most optimum form of the control transfer instruction were used.This feature can allow the patch to be to a more distant piece of codethan would otherwise be the case.

[1405] The method may include forming an outlying section of code whichincludes the control transfer instruction.

[1406] The first aspect of the invention also provides a compileradapted to carry out any of the above methods of compiling code.

[1407] In some circumstances it may be desirable or necessary to removethe patches which have been made, for example, because a code buffercontaining a section of compiled code is to be deleted, or becauseassumptions which where made about a piece of compiled code prove not tobe valid.

[1408] Thus, in a second aspect of the invention there is provided amethod of examining memory containing a section of compiled code in aself-modifying multi-threaded computer system, including identifying anypatches into the section of compiled code, and redirecting any suchpatches. The method may be carried out, for example, because a sectionof compiled code is to be deleted, or because the section of compiledcode is no longer to be used. The redirection of the patch enablesexecution to continue at the patch without the section of compiled code.

[1409] The second aspect of the invention further provides a method ofdeleting compiled code in a self-modifying multi-threaded computersystem, including selecting a section of compiled code to be deleted,identifying any patches into the section of compiled code, redirectingany such patches, and deleting the section of compiled code.

[1410] Preferably, any such patches are directed to the address of acontinuation code. The continuation code enables execution to continuewithout the section of code. The continuation code may be arranged toeffect interpretation of subsequent instructions, or it may be arrangedto perform a dispatch table transfer.

[1411] Preferably, the step of redirecting a patch is done atomically,to ensure that other threads cannot access the location being patchedwhen the patch operation is only partially completed. An alternativesolution would be to stop all executing threads while the patch wasredirected, but that is less preferred due to the execution time lostwhile the threads are stopped.

[1412] In order to identify patches going into the section of compiledcode, the method may include calculating a hash value of the address ofthe section of compiled code, and examining a hash table of patch blocksto identify any patches into the section of compiled code.

[1413] In the interests of efficient memory usage, any unused patches(such as patches out of the code buffer) should be deleted, so that theoverhead associated with the patch can be reclaimed. Therefore, themethod preferably further includes identifying any patches out of thesection of compiled code, and removing any such patches.

[1414] Thus, the second aspect of the present invention also provides amethod of examining memory in a self-modifying multi-threaded computersystem when a section of compiled code is to be deleted, includingidentifying any patches out of the section of compiled code and removingany such patches.

[1415] Preferably the method of examining memory further includes thesteps of:

[1416] examining a frame of a stack in the computer system;

[1417] identifying whether the frame contains a return address which isin the range of addresses of the section of compiled code to be deleted;

[1418] and altering the contents of the frame when such a return addressis identified.

[1419] Thus, the second aspect of the invention also provides a methodof examining memory in a self-modifying multi-threaded computer systemto allow a section of compiled code to be deleted, the method includingthe steps of:

[1420] examining a frame of a stack in the computer system;

[1421] identifying whether the frame contains a return address which isin the range of addresses of the section of compiled code;

[1422] altering the contents of the frame when such a return address isfound;

[1423] identifying any patches into the section of compiled code; and

[1424] redirecting any such patch.

[1425] Thus the second aspect of the invention preferably includes oneor more of the features of one or more aspects of the inventiondescribed in Agent's Reference No. 6 of this specification.

[1426] Preferably, the method further includes identifying any patchesout of the section of compiled code and removing any such patches.

[1427] Preferably, the alteration of the contents of the frame and/orthe redirecting of the patch are carried out at the time of deletion ofthe section of compiled code rather than, for example, as patches orreturns into the deleted code are found during execution.

[1428] The second aspect of the invention also provides apparatus forexamining memory in a self-modifying multi-threaded computer system toallow a section of compiled code to be deleted, including means foridentifying any patches into the section of compiled code, and means forredirecting any such patches. Thus, execution may continue at the patchwithout the section of compiled code.

[1429] The second aspect of the invention also provides an apparatus fordeleting compiled code in a self-modifying multi-threaded computersystem, including means for selecting a section of compiled code to bedeleted, means for identifying any patches into the section of compiledcode, means for redirecting any such patches, and means for deleting thesection of compiled code.

[1430] Preferably, the apparatus includes means for calculating a hashvalue of the address of the section of compiled code, and means forexamining a hash table of patch blocks to identify any patches into thesection of compiled code.

[1431] Preferably, the apparatus further includes means for identifyingany patches out of the section of compiled code, and means for removingany such patches.

[1432] The second aspect of the invention also provides apparatus forexamining memory in a self-modifying multi-threaded computer system toallow a section of compiled code to be deleted including means foridentifying any patches out of the section of compiled code and meansfor removing any such patches.

[1433] Features of one aspect may be applied to other aspects;similarly, method features may be applied to the apparatus and viceversa.

[1434] Preferred features of the present invention will now bedescribed, purely by way of example, with reference to the accompanyingdrawings, in which:

[1435]FIGS. 12A to 12D illustrate the use of patches in compiled code;

[1436]FIG. 12E is a flow diagram of a preferred method embodiment;

[1437]FIG. 12F illustrates the use of patches with potentiallypolymorphic methods; and

[1438]FIG. 12G is a block diagram of a preferred apparatus embodiment.

[1439] The following considerations apply to any and all the inventionsand aspects of the inventions described above.

[1440] As described above in Agent's Reference No. 1 of thisspecification, dynamic compilation may result in fragments of code in amethod being compiled, rather than the whole method. The fragments thatare compiled correspond to the dominant path, as determined, forexample, from the run time representation of the source program andexecution history information. At a later stage, other fragments of codemay be compiled, for example, where the original assumptions that weremade about the dominant path prove to be incorrect.

[1441] As an example, if the code contains a conditional controltransfer instruction (such as a conditional branch instruction or aconditional call instruction), the compiler decides whether or not thetransfer is likely to be made, and then compiles the code correspondingto the path that is most likely to be followed (the dominant path).However, during execution, it may be decided that in fact the other pathshould be followed. In such circumstances, when the transfer instructionis encountered, execution transfers to a piece of code known as ‘gluecode.’ If the path that is to be followed has not been compiled, thenthe glue code causes interpretation of subsequent instructions in thepath to be followed. If the interpreted path is followed a certainnumber of times, the compiler may decide that it is worthwhile compilingthat section of code, and will then produce a compiled version of thecode.

[1442] A self-modifying environment is thereby created, in whichsections of compiled code are created (and possibly deleted) dynamicallyduring execution. Such an environment is typically multi-threaded, withseveral processes operating in the same work space concurrently.

[1443] According to a preferred embodiment, in such a situation, a patchis made from the transfer instruction in the original section of code tothe newly compiled section of code. The patch modifies the transferinstruction so as to cause execution to transfer directly to the addressof the newly compiled section of code. In order to allow the patch to bemade, at the time of compilation the compiled code is arranged so that apatch can be inserted at a later stage, should this be required. This isdone, for example, by compiling a longer form of the transferinstruction than is necessary for the original compiled code, to allow atransfer to a more distance piece of code to be made at a later stage.

[1444] A patch may also be made from the newly compiled section of codeback to the original section of code, if necessary.

[1445] It should be noted that in a multi-threaded environment, patchingsuch as that described above needs to be done atomically, that is, as asingle instruction, so that other threads cannot view the area beingchanged in a partially changed form. Therefore, the code is arranged sothat the patch can be made atomically. To retain atomicity, the patchingcould be done as a single write operation. Alternatively, someprocessors provide one or more special instructions or sequences ofspecial instructions which ensure atomicity. In a multi-processorenvironment the address of the location being patched will probably, formany processors, need to be aligned according to the size of the patchdata (such that the address is an integer multiple of the size of theoperation).

[1446] A first example will now be described with reference to FIGS. 12Aand 12B. This example concerns the case where the non-native codecontained a call instruction.

[1447] Referring to FIG. 12A, a first code fragment 23002 has a callinstruction 23003 at address aaa. In the original non-native code thiscall instruction called the subroutine ‘bar’. During compilation thesubroutine bar was not compiled (for example, because it was not certainwhich version of bar would be used), but instead a piece of outlyingcode 23004 was created to deal with the situation where bar is called.Call instruction 23003 points to address abd in the outlying code. Atthis address there is a call instruction 23005 which transfers executionto a piece of glue code. The glue code causes the subroutine bar to beinterpreted, if no compiled version of bar exists. Thus, when callinstruction 23003 is executed, the glue code is called.

[1448] Referring now to FIG. 12B, at some later time the subroutine barhas been compiled. The compiled version of bar is stored as compiledcode 23006 at address xyz, in this example. A patch 23008 is then madefrom code fragment 23002 to compiled code 23006, either directly or viaoutlying code 23004.

[1449]FIG. 12B shows the case where the patch is made directly. In thiscase, call instruction 23003 is changed so as to point directly toaddress xyz. This is possible if call instruction 23003 has beencompiled in a form which is atomically patchable to address xyz.

[1450]FIG. 12C shows the case where the patch is made via outlier 23004.In this case, a jump instruction 23007 at address abc in the outlier23004 is set to jump to address xyz, and call instruction 23003 ischanged to point to address abc. Alternatively, call instruction 23003could point permanently to address abc, in which case jump instruction23007 would point initially to address abd (to call the glue code) andwould then be changed to point to address xyz (to make the patch).

[1451] In each case, the instruction that is changed to point to addressxyz is in a long form to allow transfers to relatively distantaddresses. Thus, when compiling the code, allowance must be made forthis. For example, the call instruction 23003 could be made to be alonger version than is required if the instruction were only to point toaddress abd, to allow the instruction to be changed to point to a moredistant address in the future. It must also be ensured that theinstruction is of a type which can have the address to which it pointschanged atomically.

[1452] At the end of the compiled version of subroutine bar, a returninstruction causes control to transfer directly back into code 23002.Once the patch has been made, execution can transfer from compiled code23002 to compiled code 23006 and back again without the need for gluecode.

[1453] For each patch, information concerning the patch is recorded in apatch block, which is stored in the code buffer (area of memory) wherethe patch originates from.

[1454]FIG. 12D illustrates an example where a section of code 23012contains a conditional branch instruction 23013 at address aaa. Duringcompilation, the compiler decided that the branch instruction wasunlikely to be followed, and so the instructions at the address wherethe original (non-native) branch instruction pointed to were notcompiled. In order to cope with the situation where this assumption iswrong, the compiler inserted outlier 23014. Initially, instruction 23013points to address abd in the outlier. At this address there is a call toglue code. The glue code causes the instructions at the address wherethe original (non-native) branch instruction pointed to be interpreted.

[1455] At some later stage, the instructions to which the branchinstruction points may be compiled, for example, because the initialassumption that these instructions are unlikely to be executed hasproved to be incorrect. The compiled version of these instructions isshown at address xyz in this example. A patch may then be made directlyto the compiled code at address xyz. This is done by changing a jumpinstruction at address abc to point to address xyz, and by changinginstruction 23013 to point to address abc. Alternatively, instruction23013 could point permanently to address abc, and the jump instructionat that address could point initially to abd, and then be changed topoint to xyz. Again, at the time of compilation, the instructions whichare to be changed to make the patch are set up so that the patch can bemade atomically.

[1456] Thus, it will be seen that an important aspect of the abovetechniques is that the compiled code is arranged so that patches may beinserted at a later stage. This can be done by ensuring that, wherethere are two or more possible paths of execution and only one path iscompiled, there exists a control transfer instruction (such as a call orjump instruction) that can be modified atomically to transfer executionto a relatively distance address.

[1457] The outliers described above may also include code for updatingregisters and states, before transferring control out of the compiledversion of code. Such outliers are described in more detail in Agent'sReference No. 3 of this specification.

[1458] In some circumstances it may be desirable or necessary to removethe patches which have been made. For example, at some stage a codebuffer containing a section of compiled code may be deleted. This may bebecause the code buffer is required for use elsewhere, or becauseassumptions that were made during compilation are no longer valid. Also,it is desirable to remove any code which is not expected to be requiredin the future, particularly when working in a limited memoryenvironment. If there is a patch into the code buffer, deleting the codebuffer would leave a patch to a section of code that no longer exists.

[1459] At the time of deletion of a code buffer, the code buffer isexamined to see if there are any patches going into or out of thebuffer. Any patches going into the code buffer are redirected so as toallow execution to continue without the buffer to be deleted, forexample, by redirecting the patch to a piece of glue code or outliercode. Any data structures relating to patches going out of the bufferare removed, in order to reclaim the overhead.

[1460] As mentioned above, when a patch is made, information concerningthe patch is stored in a patch block. Each patch block gives the ‘from’address and the ‘to’ address of the patch to which it relates. The patchblocks are stored as a chain in the code buffer where the patchesoriginate. Each code buffer therefore has a chain of patch blocksrelating to the patches from that buffer. The patch blocks aresimultaneously chained together on a second chain, according to wherethe patch is to. At the same time, a hash table is maintained, whichallows access to the various chains. A hash table is a data structureconsisting of multiple chains of blocks, in which elements are groupedaccording to an arbitrary mathematical function. Hash tables aredescribed in more detail in Agent's Reference No. 4 of thisspecification.

[1461] In order to find the patches going into the buffer to be deleted,a hash (using the same mathematical function as the hash table) is madeof the address of the buffer that the patch causes a transition to, inorder to find the chain containing the patch blocks relating to ‘to’patches. The patch blocks in the chain are then examined to see if theyrelate to patches to the buffer to be deleted. When such a patch blockis found, the patch to which it relates is redirected, for example, to apiece of glue code or outlier code, and the patch block itself isremoved from the chain. The glue code is designed to perform somegeneralised checks, and to cause the continuation of the flow ofexecution, for example by interpretation of subsequent instructions, orby jumping to another piece of compiled code. Further discussion of theaction of the glue code can be found in Agent's Reference No. 1 of thisspecification.

[1462] It may also be determined whether there are any patches from thebuffer to be deleted. This can be done by examining the chain of patchblocks stored in the buffer to be deleted using the first chaindescribed above. The patch blocks in this chain are examined, and if apatch which has not yet been deleted exists, the patch is deleted. Inthis way, the overhead associated with the patch may be reclaimed.

[1463] Referring to FIG. 12E, a method of removing patches when a codebuffer is to be deleted will be described. In step 23020 it is decidedthat a certain code buffer is to be deleted. In step 23022 a hash ismade of the address of the buffer. In step 23024 a patch block isselected from the ‘to’ chain in the hash table. In step 23026 it isdecided, from the patch block, whether there is a patch into the bufferthat is to be deleted. If there is such a patch, then in step 23028 thepatch is redirected, for example, to the address of a piece of gluecode, and the patch block is removed from the chain in the hash table.In step 23030 it is determined whether the patch block is the last inthe chain. If not, then the sequence of selecting and testing a patchblock is repeated.

[1464] Once the hash table has been examined for all patches into thecode buffer, it is then examined for patches out of the code buffer. Instep 23032 a patch block is selected from the ‘from’ (jump source) chainin the code buffer to be deleted. In step 23034 a hash is made of the‘to’ buffer address. In step 23036 the patch block is removed from thehash chain relating to the “to” buffer for that patch. In step 23038 itis determined whether the patch block is the last in the ‘from’ chain,and if not the sequence is repeated for other patch blocks in the chainuntil all the patch blocks have been examined. Finally, in step 23039,the code buffer is deleted.

[1465] In another example, the patches to or from a section of compiledcode are removed, not because the code buffer in which the code isstored is to be deleted, but because the compiled code is no longer tobe used, for instance, because assumptions that were made duringcompilation are no longer valid. For example, when a potentiallypolymorphic method has been assumed to be final, and a patch has beenmade to a compiled version of that method, if it is later discoveredthat the method is not final, then the patch to the compiled versionmust be removed. Reference is made in this respect to Agent's ReferenceNo. 9 of this specification.

[1466] Referring to FIG. 12F, a section of compiled code 23072 containsa call to a method, which may be polymorphic. Initially the method to becalled has not been compiled. Call instruction 23073 points to addressabc in a piece of outlying code 23074. At this address, there is a callto glue code. The glue code will determine which version of the methodto use, and will cause that version to be executed.

[1467] Later, an assumption may be made that the method is final, andthe method may be compiled. The compiled version of the method 23076 isshown at address xyz. A patch 23078 may then be made directly to thecompiled version of the method. This is done by changing instruction23073 to point directly to address xyz. Return 23079 is made back tocode 23072.

[1468] Later still, the assumption that the method was final may proveto be false. In this situation, patch 23078 is removed, since it is notcertain which version of the method should be used. Instruction 23073 isthen changed to point to address abd. At this address there is a call tothe dispatch table. The dispatch table determines which version of themethod should be used, and whether there is a compiled version. If thereis a compiled version, execution jumps to that version; if not,execution jumps to glue code which causes the method to be interpreted.

[1469] The technique for deleting compiled code may be used incombination with the ‘Stack Walking’ technique described in Agent'sReference No. 6 of this specification, and/or with any other techniquesdescribed herein.

[1470] Referring to FIG. 12G, an apparatus for putting the presentembodiment into effect will be described. FIG. 12G shows a computersystem including a virtual machine 23040 which allows non-native code23042 to run on host computer 23044. The virtual machine includescontrol means 23046, interpreter 23048, compiler 23050, glue code 23054,and deletion means 23056. The host computer includes a processor 23058and memory 23060 including code buffer 23062. Code buffer 23062 containscode which has been compiled by the compiler. The compiler is adapted tocompile code in any of the ways described above. Also shown in FIG. 12Gis patching means 23055 for inserting a patch from one piece of compiledcode to another. The patching means 23055 is adapted to make a patch inany of the ways described above.

[1471] In operation, the control means 23046 may decide at a certaintime that code buffer 23062 is to be deleted. It then consults a hashtable 23052 to identify any patches going into or out of the code bufferin the way described above. If any patches are found going into the codebuffer, the control means redirects those patches, for example, to gluecode 23054. If any patches are found going out of the code buffer, thecontrol means removes the patch blocks relating to those patches. Thecontrol means then instructs the deletion means 23056 to delete the codebuffer.

[1472] It will be appreciated that the virtual machine shown in FIG. 12Gwill generally be in the form of software and stored in the memory ofthe host computer 23044.

[1473] It will be understood that the present invention has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[1474] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[1475] In any or all of the aforementioned, certain features of thepresent invention have been implemented using computer software.However, it will of course be clear to the skilled man that any of thesefeatures may be implemented using hardware or a combination of hardwareand software. Furthermore, it will be readily understood that thefunctions performed by the hardware, the computer software, and suchlike, are performed on or using electrical and like signals.

[1476] Features that relate to the storage of information may beimplemented by suitable memory locations or stores. Features whichrelate to the processing of information may be implemented by a suitableprocessor or control means, either in software or in hardware or in acombination of the two.

[1477] In any or all of the aforementioned, the invention may beembodied in any, some, or all of the following forms: it may be embodiedin a method of operating a computer system; it may be embodied in thecomputer system itself; it may be embodied in a computer system whenprogrammed with or adapted or arranged to execute the method ofoperating that system; and/or it may be embodied in a computer-readablestorage medium having a program recorded thereon which is adapted tooperate according to the method of operating the system.

[1478] As used herein throughout the term ‘computer system’ may beinterchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’‘machine,’ and like terms. The computer system may be or may include avirtual machine.

[1479] In any or all of the aforementioned, different features andaspects described above, including method and apparatus features andaspects, may be combined in any appropriate fashion.

[1480] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[1481] Each feature disclosed in the description, and (whereappropriate) the claims and drawings may be provided independently or inany appropriate combination.

[1482] Some of the terms used above are specific to the Java languageand to Java-type languages. Various aspects of the inventions and theirembodiments are also applicable in the use of other languages. It willbe understood that the terms used herein should be construed broadly,unless clear to the contrary, to include similar and/or correspondingfeatures relating to other languages.

[1483] It will be understood that the present invention(s) has beendescribed above purely by way of example, and modifications of detailcan be made within the scope of the invention.

[1484] Each feature disclosed in the description and (whereappropriate), the claims and drawings, may be provided independently orin any appropriate combination.

[1485] Features of one aspect of any one of the inventions may beapplied to other aspects of the invention or other inventions describedherein. Similarly, method features may be applied to the apparatus andvice versa. CLAIMS

What is claimed is:
 1. A dynamic compiler comprising: an executionhistory recorder configured to record the number of times a fragment ofcode is executed, the execution history recorder having a threshold; aninterpreter coupled to the execution history recorder; a compilermanager coupled to the execution history recorder; and a compilercoupled to the compiler manager and arranged to create compiledfragments of code.
 2. A dynamic compiler as claimed in claim 1, furthercomprising a threshold tuner coupled to the execution history recorder,the threshold tuner operable to adjust the threshold of the executionhistory manager.
 3. A dynamic compiler as claimed in claim 1, furthercomprising: a cache searcher coupled to the interpreter; a converterdevice coupled to the interpreter; and an execution device coupled tothe converter device.
 4. A dynamic compiler as claimed in claim 1,further comprising a queue of frequently executed fragments of codecoupled to the compiler.
 5. A dynamic compiler as claimed in claim 4,wherein the compiler manager administers the queue of frequentlyexecuted fragments of code for compilation.
 6. A dynamic compiler asclaimed in claim 1, wherein the compiler manager further includes amemory manager that monitors memory available to the compiler.
 7. Adynamic compiler as claimed in claim 6, further comprising a deletercoupled to the memory manager.
 8. A dynamic compiler as claimed in claim1, wherein the number of times a fragment of code is executed isrecorded when the fragment of code is executed by the interpreter.
 9. Adynamic compiler as claimed in claim 1, wherein the dynamic compiler isa multi-threaded system and the compiler runs on a separate thread sothe progress of code execution is not blocked.
 10. A dynamic compiler asclaimed in claim 1, wherein the execution history recorder is furtherconfigured to record from where a transfer of control into the fragmentof code came and to where control is transferred out of the fragment ofcode.
 11. A dynamic compiler as claimed in claim 1, wherein theexecution history recorder is further configured to alert the compilermanager when the fragment of code has been executed the threshold numberof times.
 12. A dynamic compiler as claimed in claim 1, wherein thecompiled fragments of code created by the compiler include the dominantpath.
 13. A dynamic compiler comprising: an interpreter; a cachesearcher coupled to the interpreter; a converter device coupled to theinterpreter; an execution device coupled to the converter device; anexecution history recorder configured to record the number of times afragment of code is compiled, the execution history recorder coupled tothe interpreter and having a threshold; a threshold tuner coupled to theexecution history recorder, the threshold tuner operable to adjust thethreshold of the execution history manager; a compiler manager coupledto the execution history recorder; and a compiler coupled to thecompiler manager.
 14. A dynamic compiler as claimed in claim 13, furthercomprising a queue coupled to the compiler.
 15. A dynamic compiler asclaimed in claim 13, wherein the compiler manager further includes amemory manager that monitors memory available to the compiler.
 16. Adynamic compiler as claimed in claim 15, further comprising a deletercoupled to the memory manager.
 17. A dynamic compiler as claimed inclaim 13, wherein the compiler generates compiled code fragments withonly one entry point.
 18. A computer system comprising: a compiler forcompiling the code of an application, the compiler arranged to compile afragment of the code; a compiler manager coupled to the compiler; and aninterpreter for interpreting code of the application, the interpretercoupled to an execution history recorder, the execution history recorderarranged to record the number of times a fragment of code is executed,to record the information regarding execution of fragments of code, andto alert the compiler manager when a fragment of code has been executeda threshold number of times.
 19. A computer system as claimed in claim18, wherein the execution history recorder is arranged to record a pathof execution from a first fragment to a second fragment.
 20. A computersystem as claimed in claim 18, wherein the system is multi-threaded andthe compiler runs on a separate thread to the thread executing the code.21. A computer system as claimed in claim 18, further comprising a queuecoupled to the compiler.
 22. A computer system as claimed in claim 18,further comprising a memory manager that monitors memory available tothe compiler.
 23. A computer system as claimed in claim 22, furthercomprising a deleter coupled to the memory manager.
 24. A computersystem as claimed in claim 18, wherein the compiler, compiler manager,and interpreter are part of a virtual machine.
 25. A computer systemcontaining a compiler for compiling operating code of an application inwhich only dominant path, or near dominant path, fragments of code arecompiled.
 26. A method of compiling computer code, the methodcomprising: establishing an execution threshold; executing a number offragments of the computer code; recording the number of times each ofthe fragments of code is executed; queuing one fragment of code forcompilation when the number of times the one fragment of code has beenexecuted matches the threshold; and compiling the one fragment of code.27. A method as claimed in claim 26, further comprising adjusting thethreshold after it is established.
 28. A method as claimed in claim 26,further comprising monitoring memory available to the compiler.
 29. Amethod as claimed in claim 28, further comprising deleting code frommemory to meet the requirements of the compiler.
 30. A method as claimedin claim 26, further comprising running the compiler on a thread that isseparate from a thread of an interpreter.
 31. A method as claimed inclaim 26, further comprising recording a transfer of control into onefragment of code and a transfer out of the one fragment of code.
 32. Amethod as claimed in claim 26, further comprising searching cache forpreexisting compiled versions of fragments of code.
 33. A method asclaimed in claim 26, wherein the computer code includes at least oneMethod, and at least one of the fragments of code includes less than theentire at least one Method.
 34. A method as claimed in claim 26, furthercomprising performing an exception check.
 35. A method as claimed inclaim 34, further comprising performing a code optimization.
 36. Amethod as claimed in claim 34, further comprising interpreting exceptioncode when an exception occurs.
 37. A method as claimed in claim 34,further comprising establishing a link to a bailout device.
 38. A methodas claimed in claim 37, further comprising passing control to aninterpreter.
 39. A method as claimed in claim 34, further comprisingupdating condition states.
 40. A method as claimed in claim 39, furthercomprising interpreting exception code after updating condition states.41. A dynamic compiler comprising: an execution history recorderconfigured to record the number of times a fragment of code is executed,the execution history recorder having a threshold; an interpretercoupled to the execution history recorder; a compiler manager coupled tothe execution history recorder; a compiler coupled to the compilermanager, the compiler arranged to create compiled fragments of code witha pre-exception condition check at the beginning of at least one of thecompiled fragments of code; a bailout device coupled to the compiler,the bailout device designed to transfer control from the compiler toanother device; an execution device coupled to the bailout device; and athreshold tuner coupled to the execution history recorder, the thresholdtuner operable to adjust the threshold of the execution history manager.42. A dynamic compiler as claimed in claim 41, wherein the bailoutdevice is configured to access an area having glue code.
 43. A dynamiccompiler as claimed in claim 41, wherein the compiler is arranged tocompile fragments of code on an assumption that one or more exceptionswill not occur.
 44. A compiler as claimed in claim 41, furthercomprising a queue coupled to the compiler.
 45. A dynamic compiler asclaimed in claim 41, wherein the compiler manager further includes amemory manager that monitors memory available to the compiler.
 46. Adynamic compiler as claimed in claim 41, wherein the execution historyrecorder records the number of times the fragment of code is executedwhen the fragment of code is executed by the interpreter.
 47. A dynamiccompiler as claimed in claim 41, wherein the dynamic compiler is amulti-threaded system and the compiler runs on a separate thread so theprogress of code execution is not blocked.
 48. A dynamic compiler asclaimed in claim 41, wherein the execution history recorder is furtherconfigured to record from where a transfer of control into the fragmentof code came and to where control is transferred out of the fragment ofcode.
 49. A dynamic compiler as claimed in claim 41, wherein theexecution history recorder is further configured to alert the compilermanager when the fragment of code has been executed the threshold numberof times.
 50. A dynamic compiler as claimed in claim 41, furthercomprising: a cache searcher coupled to the interpreter; and a converterdevice coupled to the interpreter.
 51. A compiler as claimed in claim50, further comprising a queue coupled to the compiler.
 52. A dynamiccompiler as claimed in claim 50, wherein the compiler manager furtherincludes a memory manager that monitors memory available to thecompiler.
 53. A dynamic compiler as claimed in claim 50, wherein theexecution history recorder records the number of times the fragment ofcode is executed when the fragment of code is executed by theinterpreter.
 54. A dynamic compiler as claimed in claim 50, wherein thedynamic compiler is a multi-threaded system and the compiler runs on aseparate thread so the progress of code execution is not blocked.
 55. Adynamic compiler as claimed in claim 50, wherein the execution historyrecorder is further configured to record from where a transfer ofcontrol into the fragment of code came and to where control istransferred out of the fragment of code.
 56. A dynamic compiler asclaimed in claim 50, wherein the execution history recorder is furtherconfigured to alert the compiler manager when the fragment of code hasbeen executed the threshold number of times.